unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.69k stars 1.31k forks source link

When Flash Attention 2 is used and "use_dora = True", errored out: "RuntimeError: FlashAttention only support fp16 and bf16 data type" #1013

Open rohhro opened 2 months ago

rohhro commented 2 months ago

When FA2 is enabled ("FA2=True" shows up when tuning),

"Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.2. \ /| GPU: NVIDIA GeForce RTX 4090. Max memory: 23.617 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.4.0. CUDA = 8.9. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. FA [Xformers = 0.0.27.post2. FA2 = True]"

and "use_dora = True," in the script,

it always errors out "RuntimeError: FlashAttention only support fp16 and bf16 data type". And there is no way to disable FA2 in the script - I have tried many FA2 configs in the script.

The only way to use dora is to use Unsloth in a env which has no FA2 installed.

danielhanchen commented 2 months ago

@rohhro Sorry on the delay! Did you use bf16 = True or fp16 = True in the trainer?

rohhro commented 2 months ago

@rohhro Sorry on the delay! Did you use bf16 = True or fp16 = True in the trainer?

I have tried bf16 = True or fp16 = True. Same error in both cases.