Open Freddie-wei opened 11 months ago
any update for this issue? @Freddie-wei
I guess I find the reason. You may refer to https://github.com/microsoft/Megatron-DeepSpeed/issues/164#issuecomment-1827714843
@Freddie-wei, did you find a solution to this issue?
@Freddie-wei, did you find a solution to this issue?
Hello, did you have find a solution to this problem?
Describe the bug I have encountered several issues while attempting to implement a combination of moe technique and lora fine-tuning with the llama2 model using deepspeed. I am using deepspeed zero stage2 as stage3 does not support moe.
The problems arise when I pass the model parameters to the optimizer and then initialize deepspeed with the optimizer. Initially, I received the errors "all params in moe group must be moe params" and "Parameter object has no attribute group name". In order to resolve these issues, I added the attributes of allreduce and group name to all parameters, following the implementation logic of is_moe_param() and split_params_into_different_moe_groups_for_optimizer() in the moe module.
However, I am now encountering the error "AssertionError: expert data parallel group is not initialized". Please find the screenshot of the specific error below. I kindly request assistance in resolving this problem or guidance on how to approach it. Thank you.
To Reproduce Steps to reproduce the behavior:
Expected behavior no bug
Screenshots