[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
model = VideoChatGPTLlamaForCausalLM.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
# torch_dtype=torch.bfloat16 if training_args.bf16 else torch.float,
)
Can I use bfloat16 when training? I find if I use bfloat16 I can train with 24G GPUs. But I'm not sure how much this affects model performance? Can you give me some advice?
I see the code use torch.float by default.
Can I use bfloat16 when training? I find if I use bfloat16 I can train with 24G GPUs. But I'm not sure how much this affects model performance? Can you give me some advice?