johnsmith0031 / alpaca_lora_4bit

MIT License
533 stars 84 forks source link

Which script were used for 4bit quantization? #100

Open alex4321 opened 1 year ago

alex4321 commented 1 year ago

As far as I can see the model don't use GGML weights here, is that correct? (did not checked yet).

If so - what script is used to make 4bit quantization here?

For instance - if I am going to try vicuna (which is LLAMA + some delta weights) I can generate full vicuna weights, but than I will need to perform quantization - which one is used in this library case?

alex4321 commented 1 year ago

Okay, I see it should be able to use gptq weights (safetensor format). I am facing another issues while trying to generate text, but that's offtopic for this issue.

So my problem should be solveable (well, at least if vicuna code itself is not different).

johnsmith0031 commented 1 year ago

Currently this repo does not have any code related to model quantization. I think you can use it from GPTQ repo or GPTQ-LLAMA