mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.23k stars 108 forks source link

Single Node Training #111

Open xiaokj37 opened 5 months ago

xiaokj37 commented 5 months ago

Thanks for the open source of Video-ChatGPT, I really like this work very much. I am now trying to train Video-ChatGPT now. However, I only have a single node server with 8 4090 GPUs. I would like to ask how I can modify the initial training code which is adapting to multiple nodes. torchrun --nproc_per_node=8 --master_port 29001 video_chatgpt/train/train_mem.py \ --model_name_or_path <path to LLaVA-7B-Lightening-v-1-1 model> \ --version v1 \ --data_path <path to the video_chatgpt usingconvert_instruction_json_to_training_format.pyscript.> \ --video_folder <path to the spatio-temporal features generated in step 4 usingsave_spatio_temporal_clip_features.pyscript> \ --tune_mm_mlp_adapter True \ --mm_use_vid_start_end \ --bf16 True \ --output_dir ./Video-ChatGPT_7B-1.1_Checkpoints \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 3000 \ --save_total_limit 3 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 100 \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True Looking forward to your reply, thank you very much.

mmaaz60 commented 5 months ago

Hi @SeuXiao,

I appreciate your interest in our work. Please note that the Video-ChatGPT code is designed to run on single node with multiple GPUs.

In case if you face any issues, please let me know. Good Luck!

xiaokj37 commented 5 months ago

Thanks for your reply. Currently, I'd like to train video-chatgpt with my custom dataset. And my server is equipped with 8 4090 GPUs. When I use torchrun for training, it appears that CUDA out of memory. Does video-chatgpt need each GPU with 40GB memory?

mmaaz60 commented 5 months ago

Hi @SeuXiao

Video-ChatGPT uses a 7B LLM which requires at least 17 GB of Memory to load. Considering other model components and optimizer states, I believe a 32 GB GPU might work.

However, please note that the codes are tested on A100 40GB GPUs.