turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.67k stars 214 forks source link

ws example for streaming with context reuse and token testing #249

Closed Kerushii closed 10 months ago