sdan / selfextend

an implementation of Self-Extend, to expand the context window via grouped attention
https://arxiv.org/pdf/2401.01325.pdf
Apache License 2.0
115 stars 2 forks source link

Attention implementation through torch.nn.functional.scaled_dot_product_attention not supported #5

Open eightBEC opened 6 months ago

eightBEC commented 6 months ago

I followed the steps in the README and copied the 3 modeling files modeling_mistral.py, modeling_utils.py and configuration_mistral.py into my transformers folders:

Target Folders for changed files: lib/python3.11/site-packages/transformers/ /lib64/python3.11/site-packages/transformers

Clone the repository to your local machine and copy the modeling files into transformers/src/transformers/models/mistral

When initializing the weights specify the self_extend attention mechanism as such:

model = MistralForCausalLM.from_pretrained("hf_mistral-7B-v0.1", attn_implementation="self_extend")

Running the model results in the following error:

lib64/python3.11/site-packages/transformers/modeling_utils.py", line 1491, in _check_and_enable_sdpa
    raise ValueError(
ValueError: MistralForCausalLM does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new

Versions: