Open StrangeTcy opened 1 year ago
Hi, we have just released the support for continue finetuning (this previously has some issues) and an instruction on how to format your custom dataset.
Please check out the latest code base to see if it solves your problem, thanks!
https://github.com/haotian-liu/LLaVA/blob/main/docs/Finetune_Custom_Data.md
Thanks, I think that:
[
{
"id": "bar_train_00051753.png",
"image": "bar_train_00051753.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nWhat is this?\n, specifically, What is the value of the smallest individual bar in the whole chart?"
},
{
"from": "gpt",
"value": "-8"
}
]
},
{
"id": "bar_train_00073247.png",
"image": "bar_train_00073247.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nRender a clear and concise summary of the photo.\n, specifically, Which object is the least preferred in any category?"
},
{
"from": "gpt",
"value": "novel"
}
]
},
{
"id": "bar_train_00091139.png",
"image": "bar_train_00091139.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nGive a brief description of the image.\n, specifically, What percentage of people prefer the object weapon?"
},
{
"from": "gpt",
"value": "60"
}
]
-- so, the `id`s and `image`s don't have to match so long as `id`s are unique?
2. The script you've just published is amazing, but it uses LoRA and we're currently not sure we wish to go that route. Otherwise it looks really similar to the finetuning script that's been around for a while for LLaVA 1.5
Another thing that might interest you is: if we reduce the batch size to 1 it just OOMs, so I remain suspicious about indexing in `llava_arch`
so, the ids and images don't have to match so long as ids are unique?
Yes.
We updated the docs with the finetune script:
If the amount of the task-specific data is sufficient, you can also finetune from LLaVA checkpoints with full-model finetuning following this script.
Also, if you find some errors/warnings, please try the latest code base as there are fixes like https://github.com/haotian-liu/LLaVA/commit/232302ed1d8520f79cb62fa3a6213d66128ee6de
Describe the issue
Issue: We run into an indexing error when we try to finetune our LLaVA on our custom dataset (this LLaVA has previously been pretrained and finetuned on LLaVAR)
Command:
#!/bin/bash CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes=1 --nproc_per_node=2 --master_port=25001 \ /root/raw_data_for_llava/LLaVAR/LLaVA/llava/train/train_mem.py \ --model_name_or_path ./llava_R_finetuned \ --version v1 \ --data_path /root/combined_data_for_llava/combined_conv_4.json \ --image_folder /root/combined_data_for_llava/mixed_images \ --vision_tower openai/clip-vit-large-patch14-336 \ --pretrain_mm_mlp_adapter llava_R_output/mm_projector.bin \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --bf16 True \ --output_dir ./further_finetuning \ --num_train_epochs 3 \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 8 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 200 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 8 \ --lazy_preprocess True \ --report_to wandb
Log:
the error is index 8 is out of bounds for dimension 0 with size 8; image_features is tensor([[[-0.7188, -3.2969, 0.2617, ..., -4.1875, -1.8906, 3.0312], ... [-4.8438, -3.8906, 1.4375, ..., -1.5859, -2.7188, -0.3633]]], device='cuda:1', dtype=torch.bfloat16, grad_fn=<ViewBackward0>) of len 8; and cur_image_idx is 8
My suspicion is for a part of
llava_arch.prepare_inputs_labels_for_multimodal
: https://github.com/haotian-liu/LLaVA/blob/f47c16e4aeac6d4d61259800ca9cd33b26824113/llava/model/llava_arch.py#L136-156cur_image_idx += 1
leads tocur_image_idx
growing beyond the bounds of sanity.
I met the same issue,
File "/home/jovyan/work/LISA/LISA/llava/LLaVA/llava/model/llava_arch.py", line 147, in prepare_inputs_labels_for_multimodal
cur_image_features = image_features[cur_image_idx]
IndexError: index 4 is out of bounds for dimension 0 with size 4
I wonder if you have handled it. Thank you!
Hi @yjt-okkk were you able to solve this?
Same issue, any help?
Describe the issue
Issue: We run into an indexing error when we try to finetune our LLaVA on our custom dataset (this LLaVA has previously been pretrained and finetuned on LLaVAR)
Command:
Log:
My suspicion is for a part of
llava_arch.prepare_inputs_labels_for_multimodal
: https://github.com/haotian-liu/LLaVA/blob/f47c16e4aeac6d4d61259800ca9cd33b26824113/llava/model/llava_arch.py#L136-156cur_image_idx += 1
leads tocur_image_idx
growing beyond the bounds of sanity.