add flash attention feature to different seqlen batch

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

MIT License

3.54k stars 274 forks source link

Closed fahadh4ilyas closed 9 months ago

fahadh4ilyas commented 9 months ago

I just realized it contains copyrighted code from huggingface so I have to change it.