haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.9k stars 2.19k forks source link

[Question] Training/Fine tuning on custom dataset #564

Closed anonymous-atom closed 9 months ago

anonymous-atom commented 1 year ago

Question

We want to train LLaVa-1.5 13B Model on some custom dataset, can someone refer me or help me how can I fine-tune it on custom dataset, also the dataset format required and other details.

aiaicode commented 1 year ago

This is the script to fine-tune v1.5 : v1.5 This is the script that trains the model : train.py

Fine tune procedure looks something like this :

  1. : Create dataset. Your dataset is supposed to be a json file in this format : { "id": "000000243307", "image": "000000243307.jpg", "conversations": [ { "from": "human", "value": "Can you describe the main features of this image for me?\n<image>" }, { "from": "gpt", "value": "The image features a man sitting in a chair by himself, looking at a small flat-screen TV or monitor. He appears to be playing a game, as a keyboard and a mouse can be seen in front of him. ." } ] },
  2. Keep the images ready with matching id.

    --data_path ./playground/data/llava_v1_5_mix665k.json \ --image_folder ./playground/data \

In the script, replace the json file and replace the ./playground/data with your images folder

Again, I haven't done fine-tuning but this is what training looks like.

Please correct me if i am wrong @haotian-liu

clima-ai commented 1 year ago

How do we join the classes in our data to the classes already existing in the model? Should our directory (data) be organized in some special way, for example each class in a sub-directory?

anonymous-atom commented 1 year ago

@aiaicode mentioned the steps, though I still didn't proceed with it, I will start training in a week. In the meantime I would appreciate help from someone who trained/fine-tuned this model.

ashhadulislam commented 1 year ago

How much CPU/GPU is needed for fine tuning? Is there a way of sharing between RAM and GPU?

nj159 commented 11 months ago

提到了步骤,虽然我还是没有继续,但我将在一周内开始训练。与此同时,我将感谢训练/微调此模型的人的帮助。

I am a novice. Do you know the difference between the pre training dataset and fine-tuning dataset of llava? If we customize a dataset in a certain field, do we only need to prepare the fine-tuning dataset?Thank you very much!

aa221 commented 11 months ago

Can I finetune with a CPU and No GPU... I don't see why I need that much compute power

Hemachandirant commented 11 months ago

Hi, can anyone help with how the image should be named? For eg. 000000506095.jpg

JAYESH1304 commented 10 months ago

please help me to know about how should i use model checkpoints for Fine-tuning LLaVa.

AI-Aether commented 10 months ago

Can I finetune with a CPU and No GPU... I don't see why I need that much compute power

I think probably not because vision models is involved as well.

sohaibsoussi commented 8 months ago

Hi, I encountered a problem when running the shell script below for fine-tuning purposes. It always tells me that the 'lava' module is not found, even though I tried installing it using both conda and pip:

deepspeed /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/llava/train/train_mem.py --deepspeed /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/scripts/zero2.json --lora_enable True \ --lora_r 128 \ --lora_alpha 256 \ --mm_projector_lr 2e-5 \ --bits 4 \ --model_name_or_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/llava-v1.5-7b \ --version llava_llama_2 \ --data_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/train \ --validation_data_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/validation \ --image_folder /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/images \ --vision_tower openai/clip-vit-large-patch14-336 \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length True \ --bf16 True \ --output_dir /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/checkpoints/ok_vqa_finetuning --num_train_epochs 500 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_step 64 \ --evaluation_strategy "epoch" \ --save_strategy "steps" \ --save_steps 50000 \ --save_total_limit 1 \ --learning_rate 2e-4 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --lazy_preprocess True \ --report_to wandb

thebealfarisi commented 8 months ago

how to fine tuning llava for non image dataset?

using this format: [ { "id": "unique_id", "image": "", "conversations": [ { "from": "human", "value": "{question}" }, { "from": "gpt", "value": "{answer}" } ] } ]

or this format [INST] <> {prompt} <> {question} [/INST] {answer}