Open Xingxiangrui opened 6 months ago
As We all know Mixtral already support rope_theata: https://arxiv.org/abs/2310.05209 However it does not supprot rope_scalling parameters.. Will Mixtral support rope_scalling param like LLaMA does ?
"rope_scalling":{ "factor" : 4.0, "type": "linear" },
or just set it to null for the current model.
"rope_scalling":null
If it support rope_scalling param, We can merge llama and mistral model into Mixtral-MoE without modify the source code of Mixtral:
https://github.com/cg123/mergekit/issues/88
As We all know Mixtral already support rope_theata: https://arxiv.org/abs/2310.05209 However it does not supprot rope_scalling parameters.. Will Mixtral support rope_scalling param like LLaMA does ?
or just set it to null for the current model.
If it support rope_scalling param, We can merge llama and mistral model into Mixtral-MoE without modify the source code of Mixtral:
https://github.com/cg123/mergekit/issues/88