I saw mention of Q6 cache support many times, by the ExLlama dev and other people:
But I see no way to enable it in text-generation-webui. I tried to update text-generation-webui and switching to the dev branch, but still could not find the option. It would be great if it is added.
Additional Context
4-bit cache, even work can work great, it can also reduce quality or even break some models. 6-bit cache offers consistent high quality and according to the test results provided in the first link above, may offer the best quality / memory saving ratio.
Description
I saw mention of Q6 cache support many times, by the ExLlama dev and other people:
But I see no way to enable it in text-generation-webui. I tried to update text-generation-webui and switching to the dev branch, but still could not find the option. It would be great if it is added.
Additional Context
4-bit cache, even work can work great, it can also reduce quality or even break some models. 6-bit cache offers consistent high quality and according to the test results provided in the first link above, may offer the best quality / memory saving ratio.