turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.53k stars 274 forks source link

how to implement the backend of dynamic batch? #250

Closed tanklandry closed 3 months ago

tanklandry commented 9 months ago

exl2 quantize

turboderp commented 3 months ago

Dynamic batching is now supported.