Open ReginaZh opened 4 days ago
i think it's related to the deepspeed model init method. When using deepspeed the model should be initialized in a context where all new tensor creation will have 0 shape and it's inside of deepspeed source to implement the sharding & broadcast. There could be something falling off either throughout liger diffs or deepspeed/HF new version release. Will TAL and get back to this issue asap.
So it was ignore_mismatch_shapes=True
occasionally dropped and it has been fixed very recently in https://github.com/linkedin/Liger-Kernel/pull/263 😄 @ReginaZh you can try to install liger-kernel-lightly
and it should fix your issue. @shimizust do you think we can make a quick patch release for it 🤔 ?
🐛 Describe the bug
Tried to reproduce the liger kernel optimization on lighting trainer with deepspeed zero3 but encountered several errors.
Reproduce
script:
output:
I fixed above error by adding "import deepspeed" in training.py, but after that another error raised:
Versions
Environment Report:
Operating System: Linux-6.5.0-1025-azure-x86_64-with-glibc2.31 Python version: 3.10.14 PyTorch version: 2.4.1+cu121 CUDA version: 12.1 Triton version: 3.0.0 Transformers version: 4.42.4 deepspeed version: 0.15.0 liger_kernel version 0.3.0