huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.53k stars 27.13k forks source link

Adding mixtral attention_bias in style of llama modeling #28440

Open Moreh-LeeJunhyeok opened 10 months ago

Moreh-LeeJunhyeok commented 10 months ago

Feature request

System Info

transformers version: 4.36.2

Who can help?

don't have a clue about this

Information

Refer to llama2 modeling code, I want to add attention bias option in mixtral model and configuration for flexibility of experiments.
If this changes seems appropriate, I will make a PR for it

Expected behavior

After changes, attention bias option of model is added in config.
Can be controlled like example below(default config value is false)

from transformers import AutoConfig
config = AutoConfig.from_pretrained("variant_of_mixtral")
config.attention_bias = True

Motivation

Refer to llama2 modeling code, I want to add attention bias option in mixtral model and configuration for flexibility of experiments.

Your contribution

I have created a fix branch. I can make a PR of it refer to link

Moreh-LeeJunhyeok commented 10 months ago

what is a status of this issue? is this on progress? thanks in advance

amyeroberts commented 10 months ago

Hi @Moreh-LeeJunhyeok, thanks for opening an issue!

Note - making experiments easier isn't in and of itself enough of a reason to add something to a model. However, as there was the equivalent added to Llama, this seems reasonable. Could you open a PR and we can review your proposed changes?

JohnHerry commented 4 months ago

sorry for disturb. Is there any article about the difference in proformance when training Llama with or without attention bias opened?