xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

Issue #10: Kernel Fusion using torch.jit #36

Open sami-bg opened 9 months ago

sami-bg commented 9 months ago

From the issue description:

Fuse some popular functions and automatically replace modules in an existing 🤗 transformers model with their corresponding fusion module

TODOs

Reading (could be ignored)

OSLO’s kernel fusion [[link]](https://github.com/tunib-ai/oslo/blob/88dcca0441a605b462bf825cb0104bc692f14c57/oslo/fused_kernels_utils.py#L259) GPT-NeoX’s kernel fusion [[link]](https://github.com/EleutherAI/gpt-neox/blob/b02d98932f95fe0500c28698b38acb175e92e980/megatron/model/activations.py#L27)