Can I use bfloat16 when training?

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

https://mbzuai-oryx.github.io/Video-ChatGPT

Creative Commons Attribution 4.0 International

1.05k stars 92 forks source link

Can I use bfloat16 when training? #114

Open yangyuya opened 2 days ago

yangyuya commented 2 days ago

I see the code use torch.float by default.

    model = VideoChatGPTLlamaForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        # torch_dtype=torch.bfloat16 if training_args.bf16 else torch.float,
    )

Can I use bfloat16 when training? I find if I use bfloat16 I can train with 24G GPUs. But I'm not sure how much this affects model performance? Can you give me some advice?