linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training
https://arxiv.org/abs/2410.10989
BSD 2-Clause "Simplified" License
3.23k stars 173 forks source link

Empty Medusa head tensors #309

Open vkc1vk opened 2 days ago

vkc1vk commented 2 days ago

🐛 Describe the bug

Tensors saved in medusa_only_heads mode are empty. Ref: https://github.com/linkedin/Liger-Kernel/blob/main/examples/medusa/train.py#L392

Reproduce

No response

Versions

N/A

vkc1vk commented 1 day ago

cc: @jaszhu13