Closed Tcc0403 closed 3 weeks ago
To fix GA bug, latest transformers no longer imports CrossEntropyLoss. Instead, it's wrapped in self.loss_function which can be traced back to here. So that current patching method doesn't work on latest transformers model.
self.loss_function
Note that labels and logits shifting operations are wrapped in loss_function as well, we need to take care of it when implementing new patching method.
loss_function
No response
none
fixed by https://github.com/linkedin/Liger-Kernel/pull/375
🐛 Describe the bug
To fix GA bug, latest transformers no longer imports CrossEntropyLoss. Instead, it's wrapped in
self.loss_function
which can be traced back to here. So that current patching method doesn't work on latest transformers model.Note that labels and logits shifting operations are wrapped in
loss_function
as well, we need to take care of it when implementing new patching method.Reproduce
No response
Versions
none