haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.16k stars 2.22k forks source link

[Usage] LoRA finetuned weights provided for vicuna-13b-v1.3 gives NaN / inf error when performing inference on COCO-2014 questions after merging LoRA weights #408

Open DefUs3r opened 1 year ago

DefUs3r commented 1 year ago

Describe the issue

Issue:

We are trying to perform inference on the LoRA weights provided for vicuna-13b-v1.3 here. As mentioned by @haotian-liu in issue #245, we performing the merging step on the LoRA weights using the following command:

python merge_lora_weights.py \
    --model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3 \
    --model-base LLaVA/checkpoints/fastchat_llama-vicuna-v1-3-13b \
    --save-model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3-MERGE

After this, we perform the inference on 90 samples of COCO-2014 as mentioned in the paper using:

python -m llava.eval.model_vqa \
    --model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3-MERGE \
    --question-file \
    LLaVA/playground/data/coco2014_val_qa_eval/qa90_questions.jsonl \
    --image-folder \
    LLaVA/coco/coco_dataset/val2014 \
    --answers-file \
    LLaVA/model_inference_testing/coco/coco_val2014_answers-HF-vicuna-v1-3-13b-prompt-v1-test-merge.jsonl

This inference gives the following Error Log :

  0%|                                                                                                                                         | 0/90 [00:00<?, ?it/s]/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
  0%|                                                                                                                                         | 0/90 [00:33<?, ?it/s]
Traceback (most recent call last):
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/workspace/cgy/LLAVA/LLaVA/llava/eval/model_vqa.py", line 112, in <module>
    eval_model(args)
  File "/home/workspace/cgy/LLAVA/LLaVA/llava/eval/model_vqa.py", line 66, in eval_model
    output_ids = model.generate(
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py", line 2678, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

The python code we use to generate our model-base in merge_lora_weights.py is as follows :

python -m fastchat.model.apply_delta \
    --base huggyllama/llama-13b \
    --target checkpoints/fastchat_llama-vicuna-v1-3-13b \
    --delta lmsys/vicuna-13b-v1.3

Interestingly, the same procedure when done for the LoRA-Merged weights returns :

all : 76.3
complex : 90.0
conv : 75.4
detail : 63.4

implying that merge_lora_weights.py either has some issue, or the provided LoRA weights have some issue, or the model-base is faulty.

Kindly suggest fixes for whatever is the reason for this error.

wanghao-cst commented 1 year ago

I got the same error through the preview lora inference steps. link

截屏2023-09-13 09 32 28
Cubism-star commented 1 year ago

I also got the same error when using my own fine-tuned model to inference.

wanghao-cst commented 1 year ago

Describe the issue

Issue:

We are trying to perform inference on the LoRA weights provided for vicuna-13b-v1.3 here. As mentioned by @haotian-liu in issue #245, we performing the merging step on the LoRA weights using the following command:

python merge_lora_weights.py \
    --model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3 \
    --model-base LLaVA/checkpoints/fastchat_llama-vicuna-v1-3-13b \
    --save-model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3-MERGE

After this, we perform the inference on 90 samples of COCO-2014 as mentioned in the paper using:

python -m llava.eval.model_vqa \
    --model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3-MERGE \
    --question-file \
    LLaVA/playground/data/coco2014_val_qa_eval/qa90_questions.jsonl \
    --image-folder \
    LLaVA/coco/coco_dataset/val2014 \
    --answers-file \
    LLaVA/model_inference_testing/coco/coco_val2014_answers-HF-vicuna-v1-3-13b-prompt-v1-test-merge.jsonl

This inference gives the following Error Log :

  0%|                                                                                                                                         | 0/90 [00:00<?, ?it/s]/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
  0%|                                                                                                                                         | 0/90 [00:33<?, ?it/s]
Traceback (most recent call last):
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/workspace/cgy/LLAVA/LLaVA/llava/eval/model_vqa.py", line 112, in <module>
    eval_model(args)
  File "/home/workspace/cgy/LLAVA/LLaVA/llava/eval/model_vqa.py", line 66, in eval_model
    output_ids = model.generate(
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py", line 2678, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

The python code we use to generate our model-base in merge_lora_weights.py is as follows :

python -m fastchat.model.apply_delta \
    --base huggyllama/llama-13b \
    --target checkpoints/fastchat_llama-vicuna-v1-3-13b \
    --delta lmsys/vicuna-13b-v1.3

Interestingly, the same procedure when done for the LoRA-Merged weights returns :

all : 76.3
complex : 90.0
conv : 75.4
detail : 63.4

implying that merge_lora_weights.py either has some issue, or the provided LoRA weights have some issue, or the model-base is faulty.

Kindly suggest fixes for whatever is the reason for this error.

Hi, have you fixed the issue?

DefUs3r commented 1 year ago

Describe the issue

Issue: We are trying to perform inference on the LoRA weights provided for vicuna-13b-v1.3 here. As mentioned by @haotian-liu in issue #245, we performing the merging step on the LoRA weights using the following command:

python merge_lora_weights.py \
    --model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3 \
    --model-base LLaVA/checkpoints/fastchat_llama-vicuna-v1-3-13b \
    --save-model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3-MERGE

After this, we perform the inference on 90 samples of COCO-2014 as mentioned in the paper using:

python -m llava.eval.model_vqa \
    --model-path hf_checkpoints/llava-v1-0719-336px-lora-vicuna-13b-v1.3-MERGE \
    --question-file \
    LLaVA/playground/data/coco2014_val_qa_eval/qa90_questions.jsonl \
    --image-folder \
    LLaVA/coco/coco_dataset/val2014 \
    --answers-file \
    LLaVA/model_inference_testing/coco/coco_val2014_answers-HF-vicuna-v1-3-13b-prompt-v1-test-merge.jsonl

This inference gives the following Error Log :

  0%|                                                                                                                                         | 0/90 [00:00<?, ?it/s]/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py:1270: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
  warnings.warn(
  0%|                                                                                                                                         | 0/90 [00:33<?, ?it/s]
Traceback (most recent call last):
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/workspace/cgy/LLAVA/LLaVA/llava/eval/model_vqa.py", line 112, in <module>
    eval_model(args)
  File "/home/workspace/cgy/LLAVA/LLaVA/llava/eval/model_vqa.py", line 66, in eval_model
    output_ids = model.generate(
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/home/anaconda3/envs/llavacuda6/lib/python3.10/site-packages/transformers/generation/utils.py", line 2678, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

The python code we use to generate our model-base in merge_lora_weights.py is as follows :

python -m fastchat.model.apply_delta \
    --base huggyllama/llama-13b \
    --target checkpoints/fastchat_llama-vicuna-v1-3-13b \
    --delta lmsys/vicuna-13b-v1.3

Interestingly, the same procedure when done for the LoRA-Merged weights returns :

all : 76.3
complex : 90.0
conv : 75.4
detail : 63.4

implying that merge_lora_weights.py either has some issue, or the provided LoRA weights have some issue, or the model-base is faulty. Kindly suggest fixes for whatever is the reason for this error.

Hi, have you fixed the issue?

No this is not yet fixed.

terminator123 commented 11 months ago

how did you download the dataset coco/coco_dataset/val2014?

kuaileqipaoshui commented 10 months ago

how did you download the dataset coco/coco_dataset/val2014?

Do you know how to download coco_val2014 now?

Ryosuke0104 commented 9 months ago

@Cubism-star

I also got the same error when using my own fine-tuned model to inference.

Me too. Did you fix it?

Kamleshpaul commented 8 months ago

any update ? i also face same issue after finetune not able to merge

ChenRan2000 commented 6 months ago

why nobody fix it?