Open tyler-romero opened 1 month ago
curious does it require to change transformer source code? i think we can maybe raise a request
Yeah the proposed fix would require a change to transformers
unfortunately. How mllama was implemented differs very slightly from the conventions in other transformers modeling files.
sounds good. let us try to send a PR there
🐛 Describe the bug
Instead of only patching the transformers mllama module (
transformers.models.mllama.modeling_mllama
),apply_liger_kernel_to_mllama
modifiestorch.nn.LayerNorm
globally.The issue is here.
The fix would be to: (1) Not patch LayerNorm in Liger by assigning to
modeling_mllama.nn.LayerNorm
(2) Changetransformers.models.mllama.modeling_mllama
to not usefrom torch import nn
and to instead just import layernorm likefrom torch.nn import LayerNorm
(3) instead patch layernorm in Liger by assigning tomodeling_mllama.LayerNorm
Reproduce
Versions
Environment Report:
Operating System: Linux-6.1.85+-x86_64-with-glibc2.35 Python version: 3.10.12 PyTorch version: 2.4.1+cu121 CUDA version: Not available Triton version: 3.1.0 Transformers version: 4.45.0