Can we change mistral attention bias=False to bias=config.attention_bias ？

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

131.69k stars 26.22k forks source link

Can we change mistral attention bias=False to bias=config.attention_bias ？ #29553

Open Minami-su opened 6 months ago

Minami-su commented 6 months ago

just like llama attention

amyeroberts commented 5 months ago

@Minami-su Thanks for this feature request! Could you provide some more context on why you'd like to have this feature added?

In principle, it can be done by adding the config.attention_bias param which defaults to False. Care will need to be taken to make sure this is properly propagated and backwards compatible with any models which copy mistral's attention.

You'll also need to make sure that if config.attention_bias=True then the flash attention and SDPA attentions remain equivalent to the eager implementation.

Minami-su commented 5 months ago

The default setting for mistral is bias=False,so setting config.attention_bias=True is useless @amyeroberts

amyeroberts commented 5 months ago

@Minami-su If it's useless then there's no point in adding this config parameter

Minami-su commented 5 months ago

@Minami-su If it's useless then there's no point in adding this config parameter

I have a model https://huggingface.co/Minami-su/Qwen1.5-7B-Chat_mistral Because qwen attention has bias weight, but the model architecture is mistral,so....

amyeroberts commented 5 months ago

so setting config.attention_bias=True is useless

If this is useless then there's 0 value in being able to set this. If it's not useless, and you do want to add a config parameter to control the addition of bias, such as you've done in your custom model, then it's necessary to make sure the different attention layers remain equivalent.