sdan / selfextend

an implementation of Self-Extend, to expand the context window via grouped attention
https://arxiv.org/pdf/2401.01325.pdf
Apache License 2.0
115 stars 2 forks source link

SelfExtend Attention for Mistral

Implementation of the Self-Extend paper that uses group attention to extend context windows of LLMs without fine-tuning/pre-training.

Overview

The SelfExtend mechanism modifies the standard attention mechanism in the Mistral model to improve its context capturing capabilities. This is achieved by extending the attention span of the model, allowing it to consider a broader context while making predictions. This enhancement is particularly useful in tasks involving long sequences of data.

Features

Requirements

To use this implementation, the following prerequisites must be met:

Installation

Clone the repository to your local machine and copy the modeling files into transformers/src/transformers/models/mistral

When initializing the weights specify the self_extend attention mechanism as such:

model = MistralForCausalLM.from_pretrained("hf_mistral-7B-v0.1", attn_implementation="self_extend")