Closed HarikrishnanK9 closed 1 month ago
Hi @HarikrishnanK9, Thank you for bringing up this issue. The warning message indicates that you are not running the flash-attention implementation, which may result in numerical differences. However, I want to assure you that this does not affect the actual fine-tuning process. Using flash-attention can provide certain performance benefits, but it is not essential for fine-tuning.
Some tutorials may use other methods, such as using eager attention instead of flash-attention, which can trigger the warning mentioned. Again, this warning does not affect the fine-tuning process itself.
@HarikrishnanK9 If you wish to use "flash_attention_2," you can download the flash_attn package by running the following command:
pip install flash_attn
Then, update the model configuration as shown below:
model_kwargs = {
"use_cache": False,
"trust_remote_code": True,
"torch_dtype": torch.bfloat16,
"device_map": None,
"attn_implementation": "flash_attention_2"
}
Please note that "flash_attention_2" is only available on certain GPUs.
For more information, you may find these documents helpful as they describe the fine-tuning process using flash_attention:
Thank You @skytin1004 The issue is resolved "attn_implementation": "flash_attention_2" in model kwargs worked for me.
I just got the error
The following
model_kwargsare not used by the model: ['attn_implementation'] (note: typos in the generate arguments will also show up in this list)
@skytin1004 Any idea why ? I am running the interference API on A100 series Nvidia
WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.c1358f8a35e6d2af81890deffbbfa575b978c62f.modeling_phi3:You are not running the flash-attention implementation, expect numerical differences. You are not running the flash-attention implementation, expect numerical differences.