Open terminator123 opened 10 months ago
you need to copy the config.json
and non_lora_trainables.bin
into your checkpoint-5000
folder
I also have the same problem #1194. Did you solve it?
you need to copy the
config.json
andnon_lora_trainables.bin
into yourcheckpoint-5000
folder Is config.json and non_lora_trainable.bin saved only at the end of the entire training? I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?
Is config.json and non_lora_trainable.bin saved only at the end of the entire training?
I think so.
I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?
The weights of projector are saved in non_lora_trainables.bin
, which is unfrozen during sft stage.
Thank you for your reply!but I also have some question
The weights of projector are saved in
non_lora_trainables.bin
, which is unfrozen during sft stage.
- non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?
- In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora. Can you give me more detailed explanation,thank you!
Thank you for your reply!but I also have some question
The weights of projector are saved in
non_lora_trainables.bin
, which is unfrozen during sft stage.
- non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?
- In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora. Can you give me more detailed explanation,thank you!
non_lora_trainable, non_lora and trainable, so it stores projector because it's trained directly other than lora. Check here
Try:
a = torch.load('.../non_lora_trainables.bin')
print(a.keys())
Yes, you are right. And you may need to edit the source code to save projector weights in the middle.
Question
i want to test the checkpoint-5000 in lora,when i ran python scrips/merge_lora_weights.py --model-path ./checkpoints/llava-v1.5-13b-lora --model-base lmsys/vicuna-13b-v1.5 --save-model-path ./checkpoints/merge it went wrong