Open vkc1vk opened 1 month ago
cc: @jaszhu13
The problem is caused by use_orig_params: true
in FSDP configuration (link). This config means that the model variables are different from the variables for training; thus, even we add Medusa heads to the model variables, the FSDP-wrapped variables are empty.
The workaround is to use model loader in Trainer. I'll send a PR to fix this bug soon.
🐛 Describe the bug
Tensors saved in
medusa_only_heads
mode are empty. Ref: https://github.com/linkedin/Liger-Kernel/blob/main/examples/medusa/train.py#L392Reproduce
No response
Versions
N/A