Open shikhartuli opened 4 months ago
@shikhartuli, not sure if you are still working on this but I have added a compilable vesion of scattermoe here: https://github.com/mayank31398/kernel-hyperdrive/blob/04e1dd2c6eb0154eab519cc91f5a2c9a3321c105/khd/scattermoe/triton_implementation/__init__.py#L37
Also, you seem familiar 🤔
I am still testing it but it seems to work without any graph breaks
@mayank31398 I was not able to get any speed up with your version when training a 1.5B MoE model on H200s. Could you share your profiling implementation?
Also, remember me from IIT?
@shikhartuli the speedup is more for the full moe. barebones kernel is not giving me a speedup either. also compile doesnt trace through the MLIR generated from triton and most of the code is the kernel.
this is the repo: https://github.com/IBM/dolomite-engine I used for training this is the sample config: https://github.com/IBM/dolomite-engine/blob/main/configs/pretraining-examples/moe/moe.yml you can enable/disable compile in the config.
PS: yeah I remember LOL
When I compile the model, I get the following error. Any idea how to fix this?