Fixing block size for Mistral-7B.

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

BSD 3-Clause "New" or "Revised" License

5.34k stars 484 forks source link

Open Artyom17 opened 3 months ago

Artyom17 commented 3 months ago

According to Mistral's paper the block size for Mistral-7B should be 8192 (ref: https://arxiv.org/pdf/2310.06825.pdf, https://huggingface.co/docs/transformers/en/model_doc/mistral). But currently it is set to the default value (2048).

Artyom17 commented 3 months ago

It also saves some memory on 'freq_cis' tensor when the large block_size is used with relatively small max_seq_length.