issues
search
turboderp
/
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k
stars
215
forks
source link
Any blogs on the project?
#264
Open
qizzzh
opened
1 year ago
qizzzh
commented
1 year ago
Trying to learn more about the optimizations.
Trying to learn more about the optimizations.