turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

Add example of max seq length configuration #210

Closed vadi2 closed 1 year ago

vadi2 commented 1 year ago

Add example of max seq length configuration, might be helpful to others to know how to adjust this property since llama 2 works with 4096.

turboderp commented 1 year ago

Idk tbh. The default is already loaded from the model, and this would override it causing confusion for other people trying to use Llama-2 etc.

vadi2 commented 1 year ago

Aha - didn't know those defaults are loaded.