Open marsggbo opened 6 months ago
it seems that the provided code is based on a single GPU. Any tutorials for finetuning mistral-moe with expert/data/pipeline parallel?
it seems that the provided code is based on a single GPU. Any tutorials for finetuning mistral-moe with expert/data/pipeline parallel?