Expert parallelism / MoE example would be awesome :)

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD 3-Clause "New" or "Revised" License

5.36k stars 485 forks source link

Expert parallelism / MoE example would be awesome :) #62

Open andersonbcdefg opened 6 months ago

andersonbcdefg commented 6 months ago

I loved seeing the blog post with a simple, standalone implementation of many techniques used in production to speed up LLMs. Would love to see this extended to MoE like Mixtral, which at the moment seem fairly annoying to use and hack on. Curious how torch.compile can help with these, and possible issues that might arise like graph breaks due to gating.

yanboliang commented 5 months ago

@andersonbcdefg We have added the support of Mixtral-8x7B MoE, please check https://github.com/pytorch-labs/gpt-fast/pull/71. Feel free to try and share feedback.