Open lezhang7 opened 8 months ago
I met the same question when I run finetune_lora.sh, the loss suddenly increases during training.The only modification I made was to use half of the llava-v1_5_mix665k samples.
I use lora and it works without this issue, but still wonder why this happened when full-parameters finetuning.
hi, have you solved the problem? I also met the problem when finetune the videollava sourced from llava...
I had the same issue and I figured it was because I was using hugging face's "llava-hf/llava-1.5-7b-hf" as the base model. I switched the base to "liuhaotian/llava-v1.5-7b" and it resolved the NaN issue. Plus, the training performance got much better.
Question
Hi,
I have successfully pretrained the mm_projector, and finish the finetune stage with following script:
However, when I evaluate on the task, I always find the output to be empty and inference become quiet slow, so I debug step by step, and find that the weight of
llava-llama-2-7b-chat-finetune/model-0000x-of-00003.safetensors
seems contain many nans as shown follows:I follow the official pretraining and finetuning script. Any idea why this happends and how to fix it?