Open thusinh1969 opened 1 month ago
Oh it's possible the lm_head and embed_tokens were not enabled for training in the 2nd run
I found it. Same setup. Just resume_from_check_point seems to start the continueing finetuning. Will have to check quality later though.
Thanks a lot. Steve
Evidence:
We finetune without Unsloth in Qlora with rank 32, targets all linear layers AND embed/lm_head (smaller 10x lr, same padđing right key as Unsloth) in a total of 1,134,559,232 trainable parameters. Our finetuned context-lenght is 32768 on H100 NVL and our final QLoRA adapter size is 2.6G.
Unsloth continue finetune: After this finetuning successful, we continue with Unsloth.
Unsloth show only 167,772,160 parameters being trained !!! It seems it loads Unsloth default setting and no my QLoRa setup & finetuned weight at all ?:
Even weider, Unsloth finally saved the checkpoint and final adapter adapter_model.safetensors with exact 2.6G in size. Why ? What is actually going on ?
Anh idea why ?
Thanks, Steve