liltom-eth / llama2-webui

Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
MIT License
1.96k stars 201 forks source link

huggingface_hub.utils._validators.HFValidationError on CPU #11

Closed blackhawkee closed 1 year ago

blackhawkee commented 1 year ago

When running the 4bit model on CPU, receiving the below error

raise HFValidationError(huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/model/q4/llama-2-7b-chat.ggmlv3.q4_0.bin'. Use `repo_type` argument if needed.
liltom-eth commented 1 year ago

Try modifying your model path in .env.

liltom-eth commented 1 year ago

Do you mind listing more specific details of your env?

smithlai commented 1 year ago

Hi, I got the same error:

HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/d/WorkTable/Projects/Temp/llama2/llama2-webui/models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q8_0.bin'. Use `repo_type` argument if needed.

And here's my env:

MODEL_PATH = "/mnt/d/WorkTable/Projects/Temp/llama2/llama2-webui/models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q8_0.bin"
LOAD_IN_8BIT = True
LOAD_IN_4BIT = False
LLAMA_CPP = False
.........

However, if I change nothing but set LLAMA_CPP to True, it works well on CPU mode

Running on CPU with llama.cpp.
llama.cpp: loading model from /mnt/d/WorkTable/Projects/Temp/llama2/llama2-webui/models/Llama-2-7B-GGML/llama-2-7b.ggmlv3.q8_0.bin
.....
....
Running on local URL:  http://127.0.0.1:7860

I'm using WSL2, and I can run pytorch cuda

import torch
torch.cuda.is_available()
# True

Do you have any suggestion? Thank you

liltom-eth commented 1 year ago

@smithlai ggml models only work on llama.cpp (usually CPU). If you want to run models on GPU, try llama2 or gptq models.

Another way to run ggml model on GPU is through llama.cpp by installing this.

liltom-eth commented 1 year ago

WSL2 should be fine, I am also using WSL2 for GPU inference on llama2 or gptq models. gptq model loads much faster.

blackhawkee commented 1 year ago

thanks @liltom-eth , @smithlai. able to run on cpu and wsl2 as per above.

liltom-eth commented 1 year ago

@blackhawkee @smithlai welcome contributing your benchmark performance here.