magic-research / PLLaVA

Official repository for the paper PLLaVA
536 stars 35 forks source link

finetune problem #43

Open AshOneN opened 3 months ago

AshOneN commented 3 months ago

I'd like to fine-tune on downstream tasks on pllava-7B,I modified config_pllava_nframe.py 1716207674723 and this is my train_pllava.sh

1716207807764 I found that I was able to train normally 1716207994643 i think pllava_video_outputs/test_train_7b_reconstruct/pretrained_epoch09 was my last round of lora and projection layer training weights.So i run bash scripts/demo.sh pllava-7B /public/nijiahui/pllava_video_outputs/test_train_7b_reconstruct/pretrained_epoch09 to test.There are no errors, but the model is not output 1716208286864 1716208305860 Is there any problem with my procedure?

gaowei724 commented 3 months ago

Hi, I encountered the same problems like you, and have no clue so far. Have you solved this problem.

AshOneN commented 3 months ago

I've discovered two methods to solve the problem.

One approach involves running the script "bash scripts/demo.sh ${model_dir} ${weights_dir}", setting model_dir to llava-hf/llava-v1.6-vicuna-7b-hf and weights_dir to pretrained_epochXX.

Alternatively, you can create a new folder and place the files from ckpt_epochXX, pretrained_epochXX, and pretrained_stepXX into it. In this folder, ensure that the model.safetensors weight files from ckpt_epochXX and pretrained_epochXX have the same name, while retaining the weight from ckpt_epochXX.

Testing both methods with fine-tuned models yielded identical outputs for the same inputs.

ermu2001 commented 3 months ago

model_dir here is the base model's weights that would be loaded upon PllavaForCausualLM.from_pretrained. It would contain the weights of the original Image Language Model.

weight_dir then directs to the weights saved during video training (for 7B model lora, there is only the projector and lora weights trained). This part of model would be loaded again after obtaining the PeftModel with get_peft_model.

At least one of model_dir or weight_dir should contain the image model's weights. Therefore setting model_dir to llava-hf/llava-v1.6-vicuna-7b-hf would be the easiest way to do so.

I've discovered two methods to solve the problem.

One approach involves running the script "bash scripts/demo.sh ${model_dir} ${weights_dir}", setting model_dir to llava-hf/llava-v1.6-vicuna-7b-hf and weights_dir to pretrained_epochXX.

Alternatively, you can create a new folder and place the files from ckpt_epochXX, pretrained_epochXX, and pretrained_stepXX into it. In this folder, ensure that the model.safetensors weight files from ckpt_epochXX and pretrained_epochXX have the same name, while retaining the weight from ckpt_epochXX.

Testing both methods with fine-tuned models yielded identical outputs for the same inputs.

BTW, the code will only load from either a full model (sharded with huggingface format) or a lora weights + projector weights (models.safetensors). It would load the full model with higer priority. So I think this solution would only load in the pllava's weights.

https://github.com/magic-research/PLLaVA/blob/fd9194ae55750c2e1ac677056f6286c126eda580/tasks/eval/model_utils.py#L73-L102