haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.25k stars 2.24k forks source link

[Question] loss is too low, only 0.3 #956

Open an1018 opened 10 months ago

an1018 commented 10 months ago

Question

Training with custom data (12,276 images, 30 images from llava158K), after 47iters loss dropped to 0.4. Training script: deepspeed --include localhost:5,6 --master_port 29585 \ llava/train/train_mem.py \ --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \ --deepspeed ./scripts/zero3.json \ --model_name_or_path ./llava-v1.5-13b \ --version v1 \ --data_path ../dataset/llava_finetune/train.json \ --image_folder ../dataset/llava_finetune/ \ --vision_tower openai/clip-vit-large-patch14-336 \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length True \ --bf16 True \ --output_dir ./checkpoints/llava-v1.5-13b-task-lora-nuimages \ --num_train_epochs 10 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 384 \ --save_total_limit 11 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --lazy_preprocess True \ --report_to wandb this is my prompt: image

I have tried to change the dataset twice, could you give me some advice? Thank you very much

an1018 commented 10 months ago

@haotian-liu Could you help me see how I should change it

Jeckinchen commented 8 months ago

any update?

fisher75 commented 6 months ago

Hi, how do you know the training was effecitve? Did you use the default training setting? I LoRA with default parameters and basically no improvement.

bang123-box commented 3 weeks ago

Question

Training with custom data (12,276 images, 30 images from llava158K), after 47iters loss dropped to 0.4. Training script: deepspeed --include localhost:5,6 --master_port 29585 \ llava/train/train_mem.py \ --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \ --deepspeed ./scripts/zero3.json \ --model_name_or_path ./llava-v1.5-13b \ --version v1 \ --data_path ../dataset/llava_finetune/train.json \ --image_folder ../dataset/llava_finetune/ \ --vision_tower openai/clip-vit-large-patch14-336 \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length True \ --bf16 True \ --output_dir ./checkpoints/llava-v1.5-13b-task-lora-nuimages \ --num_train_epochs 10 \ --per_device_train_batch_size 16 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 384 \ --save_total_limit 11 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --lazy_preprocess True \ --report_to wandb this is my prompt: image

I have tried to change the dataset twice, could you give me some advice? Thank you very much

Same as me, I use my custom data to pretrain and lora sft the llava, when the pretrained is done, the loss is around to 0.4-0.5, when sft stage, the loss decreases rapidly from 1.0 to 0.4, then the loss is around to 0.2-04