haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.08k stars 2.21k forks source link

[Usage] Getting IndexErrors finetuning on a custom dataset #677

Open StrangeTcy opened 1 year ago

StrangeTcy commented 1 year ago

Describe the issue

Issue: We run into an indexing error when we try to finetune our LLaVA on our custom dataset (this LLaVA has previously been pretrained and finetuned on LLaVAR)

Command:

#!/bin/bash

CUDA_VISIBLE_DEVICES=0,1  torchrun --nnodes=1 --nproc_per_node=2 --master_port=25001 \
/root/raw_data_for_llava/LLaVAR/LLaVA/llava/train/train_mem.py \
    --model_name_or_path ./llava_R_finetuned \
    --version v1 \
    --data_path /root/combined_data_for_llava/combined_conv_4.json \
    --image_folder /root/combined_data_for_llava/mixed_images \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --pretrain_mm_mlp_adapter llava_R_output/mm_projector.bin \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --bf16 True \
    --output_dir ./further_finetuning \
    --num_train_epochs 3 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 8 \
    --lazy_preprocess True \
    --report_to wandb

Log:

the error is index 8 is out of bounds for dimension 0 with size 8;                                                                                                                                         
 image_features is tensor([[[-0.7188, -3.2969,  0.2617,  ..., -4.1875, -1.8906,  3.0312],
 ...
 [-4.8438, -3.8906,  1.4375,  ..., -1.5859, -2.7188, -0.3633]]],
       device='cuda:1', dtype=torch.bfloat16, grad_fn=<ViewBackward0>) of len 8; 
 and cur_image_idx is 8

My suspicion is for a part of llava_arch.prepare_inputs_labels_for_multimodal: https://github.com/haotian-liu/LLaVA/blob/f47c16e4aeac6d4d61259800ca9cd33b26824113/llava/model/llava_arch.py#L136-156 cur_image_idx += 1 leads to cur_image_idx growing beyond the bounds of sanity.

haotian-liu commented 1 year ago

Hi, we have just released the support for continue finetuning (this previously has some issues) and an instruction on how to format your custom dataset.

Please check out the latest code base to see if it solves your problem, thanks!

https://github.com/haotian-liu/LLaVA/blob/main/docs/Finetune_Custom_Data.md

StrangeTcy commented 1 year ago

Thanks, I think that:

  1. the dataset: here's the start of the file:
    
    [
    {
        "id": "bar_train_00051753.png",
        "image": "bar_train_00051753.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nWhat is this?\n, specifically, What is the value of the smallest individual bar in the whole chart?"
            },
            {
                "from": "gpt",
                "value": "-8"
            }
        ]
    },
    {
        "id": "bar_train_00073247.png",
        "image": "bar_train_00073247.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nRender a clear and concise summary of the photo.\n, specifically, Which object is the least preferred in any category?"
            },
            {
                "from": "gpt",
                "value": "novel"
            }
        ]
    },
    {
        "id": "bar_train_00091139.png",
        "image": "bar_train_00091139.jpg",
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nGive a brief description of the image.\n, specifically, What percentage of people prefer the object weapon?"
            },
            {
                "from": "gpt",
                "value": "60"
            }
        ]

-- so, the `id`s and `image`s don't have to match so long as `id`s are unique?

2. The script you've just published is amazing, but it uses LoRA and we're currently not sure we wish to go that route. Otherwise it looks really similar to the finetuning script that's been around for a while for LLaVA 1.5

Another thing that might interest you is: if we reduce the batch size to 1 it just OOMs, so I remain suspicious about indexing in `llava_arch`
haotian-liu commented 1 year ago

so, the ids and images don't have to match so long as ids are unique?

Yes.


We updated the docs with the finetune script:

If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.

Also, if you find some errors/warnings, please try the latest code base as there are fixes like https://github.com/haotian-liu/LLaVA/commit/232302ed1d8520f79cb62fa3a6213d66128ee6de

yjt-okkk commented 1 year ago

Describe the issue

Issue: We run into an indexing error when we try to finetune our LLaVA on our custom dataset (this LLaVA has previously been pretrained and finetuned on LLaVAR)

Command:

#!/bin/bash

CUDA_VISIBLE_DEVICES=0,1  torchrun --nnodes=1 --nproc_per_node=2 --master_port=25001 \
/root/raw_data_for_llava/LLaVAR/LLaVA/llava/train/train_mem.py \
    --model_name_or_path ./llava_R_finetuned \
    --version v1 \
    --data_path /root/combined_data_for_llava/combined_conv_4.json \
    --image_folder /root/combined_data_for_llava/mixed_images \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --pretrain_mm_mlp_adapter llava_R_output/mm_projector.bin \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --bf16 True \
    --output_dir ./further_finetuning \
    --num_train_epochs 3 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 8 \
    --lazy_preprocess True \
    --report_to wandb

Log:

the error is index 8 is out of bounds for dimension 0 with size 8;                                                                                                                                         
 image_features is tensor([[[-0.7188, -3.2969,  0.2617,  ..., -4.1875, -1.8906,  3.0312],
 ...
 [-4.8438, -3.8906,  1.4375,  ..., -1.5859, -2.7188, -0.3633]]],
       device='cuda:1', dtype=torch.bfloat16, grad_fn=<ViewBackward0>) of len 8; 
 and cur_image_idx is 8

My suspicion is for a part of llava_arch.prepare_inputs_labels_for_multimodal: https://github.com/haotian-liu/LLaVA/blob/f47c16e4aeac6d4d61259800ca9cd33b26824113/llava/model/llava_arch.py#L136-156 cur_image_idx += 1 leads to cur_image_idx growing beyond the bounds of sanity.

I met the same issue,
File "/home/jovyan/work/LISA/LISA/llava/LLaVA/llava/model/llava_arch.py", line 147, in prepare_inputs_labels_for_multimodal cur_image_features = image_features[cur_image_idx] IndexError: index 4 is out of bounds for dimension 0 with size 4

I wonder if you have handled it. Thank you!

anas-zafar commented 3 months ago

Hi @yjt-okkk were you able to solve this?

Shahad-Mohammed commented 3 months ago

Same issue, any help?