GPTQ & 4bit - Githubissues

hyperonym / basaran

Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.

MIT License

1.29k stars 81 forks source link

GPTQ & 4bit #180

Open olihough86 opened 1 year ago

olihough86 commented 1 year ago

My apologies if this is a really stupid question... but

Is there scope here to provide the ability to load 4bit models? such as vicuna-13B-1.1-GPTQ-4bit-128g or even 4bit 30B llama models will squeeze into 24GB VRAM. I know this can all be done in other web-ui projects, but having an OpenAI like API such as this project would be amazing.

olihough86 commented 1 year ago

I'm a moron and didn't check the closed issues...

AntouanK commented 1 year ago

@olihough86 How did you get it to work?

djmaze commented 1 year ago

GPTQ seems not supported yet, only QLora. This issue should be reopened.