linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training
https://arxiv.org/pdf/2410.10989
BSD 2-Clause "Simplified" License
3.52k stars 209 forks source link

Empty Medusa head tensors #309

Open vkc1vk opened 1 month ago

vkc1vk commented 1 month ago

🐛 Describe the bug

Tensors saved in medusa_only_heads mode are empty. Ref: https://github.com/linkedin/Liger-Kernel/blob/main/examples/medusa/train.py#L392

Reproduce

No response

Versions

N/A

vkc1vk commented 1 month ago

cc: @jaszhu13

chiwanpark commented 3 weeks ago

The problem is caused by use_orig_params: true in FSDP configuration (link). This config means that the model variables are different from the variables for training; thus, even we add Medusa heads to the model variables, the FSDP-wrapped variables are empty.

The workaround is to use model loader in Trainer. I'll send a PR to fix this bug soon.