Open tianyu-l opened 1 month ago
We should support compiled RMSNorm in the official release, instead of / in additional to using fused RMSNorm triton kernel. We can switch to CUDA kernel when it is ready in core (https://github.com/pytorch/pytorch/pull/121364#issuecomment-2023772997).
We should support compiled RMSNorm in the official release, instead of / in additional to using fused RMSNorm triton kernel. We can switch to CUDA kernel when it is ready in core (https://github.com/pytorch/pytorch/pull/121364#issuecomment-2023772997).