A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
2.66k
stars
214
forks
source link
test_inference.py : AttributeError: module 'exllamav2_ext' has no attribute 'rms_norm' #302
Closed
DFuller134 closed 9 months ago