Open rorubyy opened 9 months ago
@rorubyy I have run scripts like you. But I encount new issues. How I should fix the max model sequence len and fix the problem about uploading many images? Here is my test-json:
@rorubyy What size is your custom dataset? I'm curious about its performance with smaller datasets.
hi @rorubyy
Were you able to figure out the reason for the hidden states being nan? I'm facing the same issue
Hello @rorubyy @chanangad
I am also facing the same issue. Does anyone have a solution or any ideas on how to fix it?
I encountered the same issue while running model_vqa.py
with a fine-tuned 7b model.
I used to have the same issue and I figured it was because I was using hugging face's "llava-hf/llava-1.5-7b-hf" as the base model. I switched the base to "liuhaotian/llava-v1.5-7b" and it resolved the NaN issue. Plus, the training performance got much better.
Question
Hello LLaVA Team,
I've been working on fine-tuning the LLaVA v1.5-7B model on a custom dataset using the provided
finetune_task_lora.sh
script. Here is the configuration I used:bash scripts/v1_5/finetune_task_lora.sh
After training, these were the results:
{'train_runtime': 25078.2556, 'train_samples_per_second': 1.595, 'train_steps_per_second': 0.1, 'train_loss': 0.16062020410320182, 'epoch': 1.0}
When attempting to evaluate the model using
model_vqa.py
, I encountered a runtime error. The model loads correctly, but during the evaluation, I receive a RuntimeError: probability tensor contains either 'inf', 'nan' or element < 0.python llava/eval/model_vqa.py --model-path checkpoints/llava-v1.5-7b-task-lora/ --model-base checkpoints/llava-v1.5-7b/ --question-file Dataset/eval_ques .jsonl --image-folder ./ --answers-file /workspace/Dataset/eval_answer.jsonl
Here's the traceback:It seems that the model's hidden state outputs are all nan.
Could you help me understand what might be causing this issue and how to resolve it? Thank you very much