Closed Luo-Z13 closed 8 months ago
Hi @Luo-Z13 , thank you for your interest. We trained the model on 4 A100 40 GB gpus. You can train on one A100 80GB or on a single 40 GB A100 by using the quantised models,in 4 or 8 bit.
How long your model training?
Hi @vvuonghn, we finetuned the model for around 10 hrs for the complete dataset, and further fine-tuned for 4-5 hours on the grounding part of the dataset. Please let me know if you have any further queries.
Hi @vvuonghn, we finetuned the model for around 10 hrs for the complete dataset, and further fine-tuned for 4-5 hours on the grounding part of the dataset. Please let me know if you have any further queries.
Hi @KjAeRsTuIsK ,
Thanks for you nice work.
May I ask how to fine-tune the model on the grounding part of the datasets?
I already fine-tuned it with this:
################## VICUNA ##################
PROMPT_VERSION=v1
MODEL_VERSION="vicuna-v1.5-7b"
gpu_ids=0,1,2,3
################## VICUNA ##################
deepspeed --master_port=$((RANDOM + 10000)) --include localhost:$gpu_ids geochat/train/train_mem.py \
--deepspeed ./scripts/zero2.json \
--lora_enable True \
--model_name_or_path /data/.../geochat/llava-v1.5-7b \
--version $PROMPT_VERSION \
--data_path /data/.../geochat/GeoChat_Instruct.json \
--image_folder /data/.../geochat/final_images_llava \
--vision_tower openai/clip-vit-large-patch14-336 \
--mm_projector_type mlp2x_gelu \
--pretrain_mm_mlp_adapter /data/.../geochat/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--bf16 True \
--output_dir /data/.../geochat/outckpts/geochat_reproduce \
--num_train_epochs 1 \
--per_device_train_batch_size 18 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--save_steps 7000 \
--save_total_limit 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--dataloader_num_workers 16 \
--report_to wandb
What should I do next for fine-tuning it on the grounding part of the datasets?
I am not so familiar with the finturning of llava. Could you give me more detailed instructions when you are free recently?
Bests
Hello, I'm wondering about the minimum GPU memory required for training. Could you provide some information on this?