MoE for Vicuna - Githubissues

Thanks for the great work. I was wondering, you used MoE for Mixtral Model. have you actually used it for your model or it was implemented for testing? I see in the scripts you have scripts for llama and vicuna. can we use MoE for Vicuna as well?

        elif "mixtral" in model_args.model_name_or_path.lower():
            model = LlavaMixtralForCausalLM.from_pretrained(
                model_args.model_name_or_path,
                cache_dir=training_args.cache_dir,
                attn_implementation=attn_implementation,
                torch_dtype=(torch.bfloat16 if training_args.bf16 else None),
                **bnb_model_from_pretrained_args
            )
            from deepspeed.utils import set_z3_leaf_modules
            set_z3_leaf_modules(model, [MixtralSparseMoeBlock])

thanks.

yfzhang114 / SliME

MoE for Vicuna #9