Open Dougie777 opened 1 year ago
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
The exact same settings and quantization works for 7B and 13B. Here is my .env
MODEL_PATH = ""
MODEL_PATH = "./models/llama-2-70b-chat.ggmlv3.q4_0.bin"
BACKEND_TYPE = "llama.cpp"
LOAD_IN_8BIT = False
MAX_MAX_NEW_TOKENS = 2048 DEFAULT_MAX_NEW_TOKENS = 1024 MAX_INPUT_TOKEN_LENGTH = 4000
DEFAULT_SYSTEM_PROMPT = ""
@Dougie777 the env looks good to me. might be the error from 70b model.
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
The exact same settings and quantization works for 7B and 13B. Here is my .env
MODEL_PATH = ""
if MODEL_PATH is "", default llama.cpp/gptq models
will be downloaded to: ./models
Example ggml path:
MODEL_PATH = "./models/llama-2-7b-chat.ggmlv3.q4_0.bin"
MODEL_PATH = "./models/llama-2-70b-chat.ggmlv3.q4_0.bin"
MODEL_PATH = "./models/llama-2-13b-chat.ggmlv3.q4_0.bin"
options: llama.cpp, gptq, transformers
BACKEND_TYPE = "llama.cpp"
only for transformers bitsandbytes 8 bit
LOAD_IN_8BIT = False
MAX_MAX_NEW_TOKENS = 2048 DEFAULT_MAX_NEW_TOKENS = 1024 MAX_INPUT_TOKEN_LENGTH = 4000
DEFAULT_SYSTEM_PROMPT = ""