linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training
BSD 2-Clause "Simplified" License
2.89k stars 138 forks source link

Label Smoothing Cross Entropy #81

Closed zzw-zwzhang closed 1 week ago

zzw-zwzhang commented 2 weeks ago

🚀 The feature, motivation and pitch

Will it support this function?

Alternatives

N/A

Additional context

N/A

ByronHsu commented 2 weeks ago

Yeah we should! We might have to refactor/rewrite cross entropy kernel a bit. context: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

ByronHsu commented 1 week ago

the feature is now supported! Thanks to @Tcc0403