oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.57k stars 5.31k forks source link

Exllamav2 qcache support (Q4, Q6, Q8) #6303

Open Anthonyg5005 opened 3 months ago

Anthonyg5005 commented 3 months ago

Description

New cache options have been added a while ago. There are now Q6 and Q8 options which I don't think have been added here. I think it'd be useful for people who can use other options like Q8 to squeeze in just a bit more performance.

Additional Context

fp8 cache is also outdated and may be removed in the future image

randoentity commented 3 months ago

https://github.com/oobabooga/text-generation-webui/pull/6280