Support on Mixture of expert models

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

129.67k stars 25.76k forks source link

Support on Mixture of expert models #14814

Open fym0503 opened 2 years ago

fym0503 commented 2 years ago

Hi, I find that there are emerging works in the field of NLP on Mixture of experts based models, such as Switch Transformers from Google. However, I do not find such mixture of expert models in huggingface transformers. Do you have the plan to support such models? Thanks !

NielsRogge commented 2 years ago

Hi,

It's true, but as long as there are no pretrained weights, chances are small models are added. There are some open-source implementations of MoE available, including:

DeepSpeed: https://www.deepspeed.ai/tutorials/mixture-of-experts/
Fairseq.

Once there are some pretrained weights available somewhere, be sure to let us know!

fym0503 commented 2 years ago

Hi, I find there is a very recent implementation of MoE with code and pretrained weights for your reference https://github.com/pytorch/fairseq/tree/main/examples/moe_lm

NielsRogge commented 2 years ago

Indeed, hot of the press! Let's add them!

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

cerisara commented 2 years ago

Any progress on this front ? Thanks...

NielsRogge commented 2 years ago

cc'ing @patil-suraj here

patil-suraj commented 2 years ago

Hey @cerisara ! We have planned to add the moe_lm in Transformers but I don't have much bandwidth to work on it. If you or anyone else in the community is interested in adding it, I would be more than happy to help :)

cerisara commented 2 years ago

Hi @patil-suraj, thanks, I'm interested in these models and would like to contribute, but I'm afraid my bandwidth is too small as well, at least for now, sorry ;-)