Open alex4321 opened 1 year ago
Okay, I see it should be able to use gptq weights (safetensor format). I am facing another issues while trying to generate text, but that's offtopic for this issue.
So my problem should be solveable (well, at least if vicuna code itself is not different).
Currently this repo does not have any code related to model quantization. I think you can use it from GPTQ repo or GPTQ-LLAMA
As far as I can see the model don't use GGML weights here, is that correct? (did not checked yet).
If so - what script is used to make 4bit quantization here?
For instance - if I am going to try vicuna (which is LLAMA + some delta weights) I can generate full vicuna weights, but than I will need to perform quantization - which one is used in this library case?