haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.29k stars 2.24k forks source link

[Usage] can not apply inference with a merged mistral model after finetune it #1313

Open sayedmohamedscu opened 8 months ago

sayedmohamedscu commented 8 months ago

Describe the issue

Issue: I have finetuned mistral llava model with a sample dataset and the training was well

here is the commands of training

deepspeed llava/train/train_mem.py --deepspeed scripts/zero2.json
 --lora_enable True --lora_r 128 
--lora_alpha 256 --mm_projector_lr 2e-5 --model_name_or_path liuhaotian/llava-v1.6-mistral-7b 
--version llava_llama_2 --data_path final_dataset/train/dataset.json --image_folder final_dataset/images/ 
--vision_tower openai/clip-vit-large-patch14-336 --mm_projector_type mlp2x_gelu --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad 
--group_by_modality_length True --bf16 True --output_dir llama-2-7b-chat-task-qlora 
--num_train_epochs 50 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 
--gradient_accumulation_steps 1 --evaluation_strategy "no"
 --save_strategy "steps" --save_steps 50 --save_total_limit 1 
--learning_rate 2e-4 --weight_decay 0. --warmup_ratio 0.03 
--lr_scheduler_type "cosine" --logging_steps 1 --tf32 True 
--model_max_length 2048 --gradient_checkpointing True 
--dataloader_num_workers 4 --lazy_preprocess True

after merging using

python scripts/merge_lora_weights.py  --model-path llama-2-7b-chat-task-qlora/checkpoint-200 --model-base liuhaotian/llava-v1.6-mistral-7b --save-model-path outputfinalm

output when apply inferecse with cli with the finetuned model

python -m llava.serve.cli  --model-path 'outputfinalm'  --image-file "1.jpg"
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [02:11<00:00, 32.79s/it]
Traceback (most recent call last):
  File "/home/sayed/miniconda3/envs/llava/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/sayed/miniconda3/envs/llava/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/e/llava2/LLaVA/llava/serve/cli.py", line 128, in <module>
    main(args)
  File "/mnt/e/llava2/LLaVA/llava/serve/cli.py", line 61, in main
    image_tensor = process_images([image], image_processor, model.config)
  File "/mnt/e/llava2/LLaVA/llava/mm_utils.py", line 176, in process_images
    image = process_anyres_image(image, image_processor, model_cfg.image_grid_pinpoints)
  File "/mnt/e/llava2/LLaVA/llava/mm_utils.py", line 138, in process_anyres_image
    patches = divide_to_patches(image_padded, processor.crop_size['height'])
AttributeError: 'NoneType' object has no attribute 'crop_size'```

note


python -m llava.serve.cli  --model-path 'liuhaotian/llava-v1.6-mistral-7b'  --image-file "1.jpg"

works well

I think the problem in something during training process as I have used this is there anything wrong about this ?

--model_name_or_path liuhaotian/llava-v1.6-mistral-7b --version llava_llama_2

fisher75 commented 6 months ago

Hi, how do you know the training was well? Did you use the default training setting? I LoRA with default parameters and basically no improvement.

yiyiwwang commented 5 months ago

Describe the issue

Issue: I have finetuned mistral llava model with a sample dataset and the training was well

here is the commands of training

deepspeed llava/train/train_mem.py --deepspeed scripts/zero2.json
 --lora_enable True --lora_r 128 
--lora_alpha 256 --mm_projector_lr 2e-5 --model_name_or_path liuhaotian/llava-v1.6-mistral-7b 
--version llava_llama_2 --data_path final_dataset/train/dataset.json --image_folder final_dataset/images/ 
--vision_tower openai/clip-vit-large-patch14-336 --mm_projector_type mlp2x_gelu --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad 
--group_by_modality_length True --bf16 True --output_dir llama-2-7b-chat-task-qlora 
--num_train_epochs 50 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 
--gradient_accumulation_steps 1 --evaluation_strategy "no"
 --save_strategy "steps" --save_steps 50 --save_total_limit 1 
--learning_rate 2e-4 --weight_decay 0. --warmup_ratio 0.03 
--lr_scheduler_type "cosine" --logging_steps 1 --tf32 True 
--model_max_length 2048 --gradient_checkpointing True 
--dataloader_num_workers 4 --lazy_preprocess True

after merging using

python scripts/merge_lora_weights.py  --model-path llama-2-7b-chat-task-qlora/checkpoint-200 --model-base liuhaotian/llava-v1.6-mistral-7b --save-model-path outputfinalm

output when apply inferecse with cli with the finetuned model

python -m llava.serve.cli  --model-path 'outputfinalm'  --image-file "1.jpg"
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [02:11<00:00, 32.79s/it]
Traceback (most recent call last):
  File "/home/sayed/miniconda3/envs/llava/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/sayed/miniconda3/envs/llava/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/e/llava2/LLaVA/llava/serve/cli.py", line 128, in <module>
    main(args)
  File "/mnt/e/llava2/LLaVA/llava/serve/cli.py", line 61, in main
    image_tensor = process_images([image], image_processor, model.config)
  File "/mnt/e/llava2/LLaVA/llava/mm_utils.py", line 176, in process_images
    image = process_anyres_image(image, image_processor, model_cfg.image_grid_pinpoints)
  File "/mnt/e/llava2/LLaVA/llava/mm_utils.py", line 138, in process_anyres_image
    patches = divide_to_patches(image_padded, processor.crop_size['height'])
AttributeError: 'NoneType' object has no attribute 'crop_size'```

note


python -m llava.serve.cli  --model-path 'liuhaotian/llava-v1.6-mistral-7b'  --image-file "1.jpg"

works well

I think the problem in something during training process as I have used this is there anything wrong about this ?

--model_name_or_path liuhaotian/llava-v1.6-mistral-7b --version llava_llama_2

I have meet similar problem. I find the reason lies in the model name, try adding "llava-" in your merged model name, and then run again.

Without "llava" in the model name, the image processor will not be loaded. image