linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training
https://arxiv.org/pdf/2410.10989
BSD 2-Clause "Simplified" License
3.64k stars 215 forks source link

LigerCrossEntropyLoss is not patched for latest transformers models #369

Closed Tcc0403 closed 3 weeks ago

Tcc0403 commented 3 weeks ago

🐛 Describe the bug

To fix GA bug, latest transformers no longer imports CrossEntropyLoss. Instead, it's wrapped in self.loss_function which can be traced back to here. So that current patching method doesn't work on latest transformers model.

Note that labels and logits shifting operations are wrapped in loss_function as well, we need to take care of it when implementing new patching method.

Reproduce

No response

Versions

none

ByronHsu commented 3 weeks ago

fixed by https://github.com/linkedin/Liger-Kernel/pull/375