Continuous Batching support

turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

MIT License

2.67k stars 214 forks source link

Open FireMasterK opened 11 months ago

FireMasterK commented 11 months ago

vLLM, and HF's TGI can do this.