Closed blackhawkee closed 1 year ago
Try modifying your model path in .env
.
Do you mind listing more specific details of your env?
Hi, I got the same error:
HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/d/WorkTable/Projects/Temp/llama2/llama2-webui/models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q8_0.bin'. Use `repo_type` argument if needed.
And here's my env:
MODEL_PATH = "/mnt/d/WorkTable/Projects/Temp/llama2/llama2-webui/models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q8_0.bin"
LOAD_IN_8BIT = True
LOAD_IN_4BIT = False
LLAMA_CPP = False
.........
However, if I change nothing but set LLAMA_CPP to True, it works well on CPU mode
Running on CPU with llama.cpp.
llama.cpp: loading model from /mnt/d/WorkTable/Projects/Temp/llama2/llama2-webui/models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q8_0.bin
.....
....
Running on local URL: http://127.0.0.1:7860
I'm using WSL2, and I can run pytorch cuda
import torch
torch.cuda.is_available()
# True
Do you have any suggestion? Thank you
@smithlai ggml models only work on llama.cpp (usually CPU). If you want to run models on GPU, try llama2 or gptq models.
Another way to run ggml model on GPU is through llama.cpp by installing this.
WSL2 should be fine, I am also using WSL2 for GPU inference on llama2 or gptq models. gptq model loads much faster.
thanks @liltom-eth , @smithlai. able to run on cpu and wsl2 as per above.
@blackhawkee @smithlai welcome contributing your benchmark performance here.
When running the 4bit model on CPU, receiving the below error