Open shaunstoltz opened 1 year ago
You can use commands like below:
python3 -m fastchat.serve.model_worker \ --model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \ --gptq-ckpt models/vicuna-7B-1.1-GPTQ-4bit-128g/vicuna-7B-1.1-GPTQ-4bit-128g.safetensors \ --gptq-wbits 4 \ --gptq-groupsize 128 \ --gptq-act-order
To ^^ this need to install: https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/fastest-inference-4bit
Whole thing is here, still not merged:
https://github.com/alanxmay/FastChat/tree/fastest-gptq-4bit-support
Getting this error:
Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models/.
cmd:
python3 -m fastchat.serve.cli --model-path models/ --num-gpus 4