turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.54k stars 274 forks source link

add flash attention feature to different seqlen batch #237

Closed fahadh4ilyas closed 9 months ago

fahadh4ilyas commented 9 months ago

I just realized it contains copyrighted code from huggingface so I have to change it.