pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
1.28k stars 115 forks source link

add compiled RMSNorm into the norm config #374

Open tianyu-l opened 1 month ago

tianyu-l commented 1 month ago

We should support compiled RMSNorm in the official release, instead of / in additional to using fused RMSNorm triton kernel. We can switch to CUDA kernel when it is ready in core (https://github.com/pytorch/pytorch/pull/121364#issuecomment-2023772997).