Open Tedy50 opened 4 months ago
Looks like there is some recent idea about compressing the model during quantization with pretty good results https://twitter.com/rohanpaul_ai/status/1755521957058257033
exllama is currently the best thing we have for AI in terms of performance, but this model compression could move it to the next level allowing it to fit bigger models into VRAM
Looks like there is some recent idea about compressing the model during quantization with pretty good results https://twitter.com/rohanpaul_ai/status/1755521957058257033
exllama is currently the best thing we have for AI in terms of performance, but this model compression could move it to the next level allowing it to fit bigger models into VRAM