xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.
398 stars 20 forks source link

loss curve of llava-next-llama3 #12

Open simplelifetime opened 4 months ago

simplelifetime commented 4 months ago

Thanks for your great work! I'm wondering if u can share the loss curve for training llava-next-llama3? I've observed some different behaviour compared to training llava-next-vicuna-7b. I'm wondering if it's normal or do I make some mistakes during training.

hkunzhe commented 4 months ago

@simplelifetime Could you share your loss curve with both llava-next-vicuna-7b and llava-next-llama3

homiec commented 4 months ago

+1, thanks

mmderakhshani commented 1 month ago

Hi @simplelifetime

Regarding your question about llama3, I am getting zero loss value in the fine-tuning stage. Did you also get the same loss values?

{'loss': 1.9166, 'learning_rate': 2.0876826722338203e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 4.1753653444676405e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 6.263048016701463e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 8.350730688935281e-08, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.0438413361169103e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.2526096033402926e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.4613778705636743e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.6701461377870562e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 1.8789144050104384e-07, 'epoch': 0.0}
{'loss': 0.0, 'learning_rate': 2.0876826722338207e-07, 'epoch': 0.0} {'loss': 0.0, 'learning_rate': 2.2964509394572026e-07, 'epoch': 0.0}

Following is my code for fine-tuning:


export BASE_LR=2e-5
export VIT_LR=2e-6
DEVICE_BATCH_SIZE=2
GRADIENT_ACCU_STEPS=2

deepspeed llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --version llava_llama_3 \
    --data_path ${DATA_PATH} \
    --image_folder ${LLaVA_PATH}/data \
    --vision_tower openai/clip-vit-large-patch14-336 \
    --mm_projector_type mlp2x_gelu \
    --pretrain_mm_mlp_adapter ${OUTPUT}/checkpoints/llava-v1.6-8b_llama3-8b_pretrain_lcs-558k_ft-mlp-lr-1e-3/mm_projector.bin \
    --unfreeze_mm_vision_tower True \
    --mm_vision_tower_lr ${VIT_LR} \
    --image_aspect_ratio anyres \
    --group_by_modality_length True \
    --mm_vision_select_layer -2 \
    --mm_vision_select_feature patch \
    --mm_patch_merge_type spatial_unpad \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --output_dir ${OUTPUT}/checkpoints/${SAVE_PATH} \
    --num_train_epochs 1 \
    --per_device_train_batch_size ${DEVICE_BATCH_SIZE} \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps ${GRADIENT_ACCU_STEPS} \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 7975 \
    --save_total_limit 1 \
    --learning_rate ${BASE_LR} \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 6144 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to wandb \
    --run_name ${SAVE_PATH}