issues
search
turboderp
/
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k
stars
215
forks
source link
FlashAttention-2, 2x faster than FlashAttention
#161
Closed
nikshepsvn
closed
1 year ago
nikshepsvn
commented
1 year ago
https://twitter.com/tri_dao/status/1680987580228308992
https://twitter.com/tri_dao/status/1680987580228308992