Open simplelifetime opened 4 months ago
@simplelifetime Could you share your loss curve with both llava-next-vicuna-7b and llava-next-llama3
+1, thanks
Hi @simplelifetime
Regarding your question about llama3, I am getting zero loss value in the fine-tuning stage. Did you also get the same loss values?
{'loss': 1.9166, 'learning_rate': 2.0876826722338203e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 4.1753653444676405e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 6.263048016701463e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 8.350730688935281e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.0438413361169103e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.2526096033402926e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.4613778705636743e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.6701461377870562e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.8789144050104384e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 2.0876826722338207e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 2.2964509394572026e-07, 'epoch': 0.0}
Following is my code for fine-tuning:
export BASE_LR=2e-5
export VIT_LR=2e-6
DEVICE_BATCH_SIZE=2
GRADIENT_ACCU_STEPS=2
deepspeed llava/train/train_mem.py \
--deepspeed ./scripts/zero2.json \
--model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
--version llava_llama_3 \
--data_path ${DATA_PATH} \
--image_folder ${LLaVA_PATH}/data \
--vision_tower openai/clip-vit-large-patch14-336 \
--mm_projector_type mlp2x_gelu \
--pretrain_mm_mlp_adapter ${OUTPUT}/checkpoints/llava-v1.6-8b_llama3-8b_pretrain_lcs-558k_ft-mlp-lr-1e-3/mm_projector.bin \
--unfreeze_mm_vision_tower True \
--mm_vision_tower_lr ${VIT_LR} \
--image_aspect_ratio anyres \
--group_by_modality_length True \
--mm_vision_select_layer -2 \
--mm_vision_select_feature patch \
--mm_patch_merge_type spatial_unpad \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--bf16 True \
--output_dir ${OUTPUT}/checkpoints/${SAVE_PATH} \
--num_train_epochs 1 \
--per_device_train_batch_size ${DEVICE_BATCH_SIZE} \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps ${GRADIENT_ACCU_STEPS} \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 7975 \
--save_total_limit 1 \
--learning_rate ${BASE_LR} \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 6144 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to wandb \
--run_name ${SAVE_PATH}
Thanks for your great work! I'm wondering if u can share the loss curve for training llava-next-llama3? I've observed some different behaviour compared to training llava-next-vicuna-7b. I'm wondering if it's normal or do I make some mistakes during training.