[Question] how to merge the middle checkpoint file with lora

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

https://llava.hliu.cc

Apache License 2.0

20.03k stars 2.21k forks source link

[Question] how to merge the middle checkpoint file with lora #922

Open terminator123 opened 10 months ago

terminator123 commented 10 months ago

Question

i want to test the checkpoint-5000 in lora，when i ran python scrips/merge_lora_weights.py --model-path ./checkpoints/llava-v1.5-13b-lora --model-base lmsys/vicuna-13b-v1.5 --save-model-path ./checkpoints/merge it went wrong

Isaachhh commented 10 months ago

you need to copy the config.json and non_lora_trainables.bin into your checkpoint-5000 folder

charismaticchiu commented 8 months ago

I also have the same problem #1194. Did you solve it?

wuwu-C commented 6 months ago

you need to copy the config.json and non_lora_trainables.bin into your checkpoint-5000 folder Is config.json and non_lora_trainable.bin saved only at the end of the entire training? I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?

Isaachhh commented 6 months ago

Is config.json and non_lora_trainable.bin saved only at the end of the entire training?

I think so.

I have set epoch 10, can I copy these two files from epoch 10 directly to the first nine?

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

wuwu-C commented 6 months ago

Thank you for your reply！but I also have some question

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?

In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora. Can you give me more detailed explanation,thank you!

Isaachhh commented 6 months ago

Thank you for your reply！but I also have some question

The weights of projector are saved in non_lora_trainables.bin, which is unfrozen during sft stage.

non_lora_trainable.bin is not storing the weight without lora trimming part, shouldn't it be frozen? Why is it a weight store for projectors?

In your previous answer, you said copy the two files to the corresponding weight folder.If it is unfrozen during sft stage, this way is incorrect.How can I merge the middle checkpoint file with lora. Can you give me more detailed explanation,thank you!

non_lora_trainable, non_lora and trainable, so it stores projector because it's trained directly other than lora. Check here Try: a = torch.load('.../non_lora_trainables.bin') print(a.keys())
Yes, you are right. And you may need to edit the source code to save projector weights in the middle.