liltom-eth / llama2-webui

Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.

MIT License

1.96k stars 201 forks source link

Unable to load 70B llama2 on cpu (llama cpp) #66

Open Dougie777 opened 1 year ago

Dougie777 commented 1 year ago

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

The exact same settings and quantization works for 7B and 13B. Here is my .env

MODEL_PATH = ""

if MODEL_PATH is "", default llama.cpp/gptq models

will be downloaded to: ./models

Example ggml path:

MODEL_PATH = "./models/llama-2-7b-chat.ggmlv3.q4_0.bin"

MODEL_PATH = "./models/llama-2-70b-chat.ggmlv3.q4_0.bin"

MODEL_PATH = "./models/llama-2-13b-chat.ggmlv3.q4_0.bin"

options: llama.cpp, gptq, transformers

BACKEND_TYPE = "llama.cpp"

only for transformers bitsandbytes 8 bit

LOAD_IN_8BIT = False

MAX_MAX_NEW_TOKENS = 2048 DEFAULT_MAX_NEW_TOKENS = 1024 MAX_INPUT_TOKEN_LENGTH = 4000

DEFAULT_SYSTEM_PROMPT = ""

liltom-eth commented 1 year ago

@Dougie777 the env looks good to me. might be the error from 70b model.