Closed anonymous-atom closed 9 months ago
This is the script to fine-tune v1.5 : v1.5 This is the script that trains the model : train.py
Fine tune procedure looks something like this :
{ "id": "000000243307", "image": "000000243307.jpg", "conversations": [ { "from": "human", "value": "Can you describe the main features of this image for me?\n<image>" }, { "from": "gpt", "value": "The image features a man sitting in a chair by himself, looking at a small flat-screen TV or monitor. He appears to be playing a game, as a keyboard and a mouse can be seen in front of him. ." } ] },
Keep the images ready with matching id.
--data_path ./playground/data/llava_v1_5_mix665k.json \ --image_folder ./playground/data \
In the script, replace the json file and replace the ./playground/data with your images folder
Again, I haven't done fine-tuning but this is what training looks like.
Please correct me if i am wrong @haotian-liu
How do we join the classes in our data to the classes already existing in the model? Should our directory (data) be organized in some special way, for example each class in a sub-directory?
@aiaicode mentioned the steps, though I still didn't proceed with it, I will start training in a week. In the meantime I would appreciate help from someone who trained/fine-tuned this model.
How much CPU/GPU is needed for fine tuning? Is there a way of sharing between RAM and GPU?
提到了步骤,虽然我还是没有继续,但我将在一周内开始训练。与此同时,我将感谢训练/微调此模型的人的帮助。
I am a novice. Do you know the difference between the pre training dataset and fine-tuning dataset of llava? If we customize a dataset in a certain field, do we only need to prepare the fine-tuning dataset?Thank you very much!
Can I finetune with a CPU and No GPU... I don't see why I need that much compute power
Hi, can anyone help with how the image should be named? For eg. 000000506095.jpg
please help me to know about how should i use model checkpoints for Fine-tuning LLaVa.
Can I finetune with a CPU and No GPU... I don't see why I need that much compute power
I think probably not because vision models is involved as well.
Hi, I encountered a problem when running the shell script below for fine-tuning purposes. It always tells me that the 'lava' module is not found, even though I tried installing it using both conda and pip:
deepspeed /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/llava/train/train_mem.py --deepspeed /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/scripts/zero2.json --lora_enable True \ --lora_r 128 \ --lora_alpha 256 \ --mm_projector_lr 2e-5 \ --bits 4 \ --model_name_or_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/llava-v1.5-7b \ --version llava_llama_2 \ --data_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/train \ --validation_data_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/validation \ --image_folder /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/images \ --vision_tower openai/clip-vit-large-patch14-336 \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length True \ --bf16 True \ --output_dir /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/checkpoints/ok_vqa_finetuning --num_train_epochs 500 \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_step 64 \ --evaluation_strategy "epoch" \ --save_strategy "steps" \ --save_steps 50000 \ --save_total_limit 1 \ --learning_rate 2e-4 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --lazy_preprocess True \ --report_to wandb
Log after runing the shell script:
deepspeed /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/llava/train/train_mem.py
[2024-02-20 23:14:05,751] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-20 23:14:07,655] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-02-20 23:14:07,711] [INFO] [runner.py:568:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/llava/train/train_mem.py
[2024-02-20 23:14:10,006] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-20 23:14:10,383] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2024-02-20 23:14:10,383] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-02-20 23:14:10,383] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-02-20 23:14:10,383] [INFO] [launch.py:163:main] dist_world_size=1
[2024-02-20 23:14:10,383] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-02-20 23:14:10,384] [INFO] [launch.py:253:main] process 1377826 spawned with command: ['/usr/bin/python3', '-u', '/home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/llava/train/train_mem.py', '--local_rank=0']
Traceback (most recent call last):
File "/home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/llava/train/train_mem.py", line 1, in
--deepspeed /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/LLaVA/scripts/zero2.json train_ok_vqa.sh: 4: --deepspeed: not found
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 train_ok_vqa.sh: 5: --lora_enable: not found
--bits 4 --model_name_or_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/llava-v1.5-7b --version llava_llama_2 train_ok_vqa.sh: 9: --bits: not found
--data_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/train --validation_data_path /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/validation --image_folder /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/ok_vqa_dataset/images --vision_tower openai/clip-vit-large-patch14-336 --mm_projector_type mlp2x_gelu --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad --group_by_modality_length True --bf16 True --output_dir /home/sohaib/LPRI_projects/Image-txt--txt/LLAVA/checkpoints/ok_vqa_finetuning train_ok_vqa.sh: 12: --data_path: not found
--num_train_epochs 500 --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --gradient_accumulation_step 64 --evaluation_strategy epoch --save_strategy steps --save_steps 50000 --save_total_limit 1 --learning_rate 2e-4 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --model_max_length 2048 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to wandb train_ok_vqa.sh: 24: --num_train_epochs: not found
how to fine tuning llava for non image dataset?
using this format: [ { "id": "unique_id", "image": "", "conversations": [ { "from": "human", "value": "{question}" }, { "from": "gpt", "value": "{answer}" } ] } ]
or this format
[INST] <
Question
We want to train LLaVa-1.5 13B Model on some custom dataset, can someone refer me or help me how can I fine-tune it on custom dataset, also the dataset format required and other details.