Open xiaokj37 opened 5 months ago
Hi @SeuXiao,
I appreciate your interest in our work. Please note that the Video-ChatGPT code is designed to run on single node with multiple GPUs.
In case if you face any issues, please let me know. Good Luck!
Thanks for your reply. Currently, I'd like to train video-chatgpt with my custom dataset. And my server is equipped with 8 4090 GPUs. When I use torchrun for training, it appears that CUDA out of memory. Does video-chatgpt need each GPU with 40GB memory?
Hi @SeuXiao
Video-ChatGPT uses a 7B LLM which requires at least 17 GB of Memory to load. Considering other model components and optimizer states, I believe a 32 GB GPU might work.
However, please note that the codes are tested on A100 40GB GPUs.
Thanks for the open source of Video-ChatGPT, I really like this work very much. I am now trying to train Video-ChatGPT now. However, I only have a single node server with 8 4090 GPUs. I would like to ask how I can modify the initial training code which is adapting to multiple nodes.
torchrun --nproc_per_node=8 --master_port 29001 video_chatgpt/train/train_mem.py \ --model_name_or_path <path to LLaVA-7B-Lightening-v-1-1 model> \ --version v1 \ --data_path <path to the video_chatgpt using
convert_instruction_json_to_training_format.pyscript.> \ --video_folder <path to the spatio-temporal features generated in step 4 using
save_spatio_temporal_clip_features.pyscript> \ --tune_mm_mlp_adapter True \ --mm_use_vid_start_end \ --bf16 True \ --output_dir ./Video-ChatGPT_7B-1.1_Checkpoints \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 3000 \ --save_total_limit 3 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 100 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True
Looking forward to your reply, thank you very much.