Thanks for the great work. I was wondering, you used MoE for Mixtral Model. have you actually used it for your model or it was implemented for testing? I see in the scripts you have scripts for llama and vicuna. can we use MoE for Vicuna as well?
elif "mixtral" in model_args.model_name_or_path.lower():
model = LlavaMixtralForCausalLM.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
attn_implementation=attn_implementation,
torch_dtype=(torch.bfloat16 if training_args.bf16 else None),
**bnb_model_from_pretrained_args
)
from deepspeed.utils import set_z3_leaf_modules
set_z3_leaf_modules(model, [MixtralSparseMoeBlock])
Sorry for the delayed reply. It is just for testing and we didn't incorporate Mixtral for training. However, in out opinion, the moe adapter should be able to be combined with any LLM.
Thanks for the great work. I was wondering, you used MoE for Mixtral Model. have you actually used it for your model or it was implemented for testing? I see in the scripts you have scripts for llama and vicuna. can we use MoE for Vicuna as well?
thanks.