What is the `max_seq_len` in Mistral?

mistralai / mistral-inference

Official inference library for Mistral models

Apache License 2.0

9.69k stars 860 forks source link

What is the max_seq_len (or max_position_embeddings) of Mistral-7B-v0.1 when training?

The official code says it is 128_000. (https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/mistral/model.py#L201C69-L201C69)

The config file in huggingface says it is 32768. (https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json).

And the official blog mentions 16k.

And the paper claims an attention span of 131K tokens (Section 2 on "Architectural details" → "Sliding Window Attention").

mistralai / mistral-inference

What is the `max_seq_len` in Mistral? #53