xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

WIP: Starting cuda setup #32

Open isamu-isozaki opened 9 months ago

isamu-isozaki commented 9 months ago

This is a draft PR. I am porting from colossalai. The goal of this pr is to be able run at least 1d attention with cuda kernels as well as optionally triton and jit compiled kernels. The steps needed for this is

xrsrke commented 9 months ago

thanks. @isamu-isozaki would be cool if users only needed to install pipegoose through pip one time. That's all - no need to install anything else in order to use the CUDA kernels

also since triton is superior than jit, so maybe we'll write all these activation functions in triton.

xrsrke commented 9 months ago

@isamu-isozaki also what kernels will you port?

isamu-isozaki commented 9 months ago

@xrsrke Ah well, it depends on how good of triton kernels we make. Sometimes torch.compile can give better results than triton. For now, I want to experiment with just porting the attention kernel from colossalai and the dependencies for that. Ideally, like you said a single pip install should do the job