Runntime error when load native llama2 70b INT4 model

dongmovidius commented 12 months ago

System environment:

bigdl-llm version: 2.4.0b20230921 OS: windows CPU: Intel Core 13950HX/64GB memory

I'd like to run llama2 on CPU

Procedure to reproduce the issue:

run below code to load llama2 70b native int4 model

from bigdl.llm.transformers import LlamaForCausalLM
llm = LlamaForCausalLM.from_pretrained("D:/llama/model/ggml-llama2-70b-q4_0.bin", native=True)

The issue logs are captured as below

2023-09-22 16:32:23,278 - ERROR -

****Usage Error**** The attribute ctx of Llama object is None. 2023-09-22 16:32:23,279 - ERROR -

****Call Stack***** 2023-09-22 16:32:23,279 - ERROR -

****Usage Error**** Could not load model from path: D:/llama/model/ggml-llama2-70b-q4_0.bin. Please make sure the CausalLM class matches the model you want to load.Received error The attribute ctx of Llama object is None. 2023-09-22 16:32:23,280 - ERROR -

Call Stack*

RuntimeError Traceback (most recent call last) File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\transformers\modelling_bigdl.py:119, in _BaseGGMLClass.from_pretrained(cls, pretrained_model_name_or_path, native, dtype, *args, kwargs) 118 ggml_model_path = pretrained_model_name_orpath --> 119 model = class(model_path=ggml_model_path, kwargs) 120 else:

File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\ggml\model\llama\llama.py:211, in Llama.init(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, verbose) 207 self.ctx = llama_cpp.llama_init_from_file( 208 self.model_path.encode("utf-8"), self.params 209 ) --> 211 invalidInputError(self.ctx is not None, "The attribute ctx of Llama object is None.") 213 if self.lora_path:

File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\utils\common\log4Error.py:32, in invalidInputError(condition, errMsg, fixMsg) 31 outputUserMessage(errMsg, fixMsg) ---> 32 raise RuntimeError(errMsg)

RuntimeError: The attribute ctx of Llama object is None.

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last) Cell In[4], line 4 1 #load the converted model 2 #switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models 3 from bigdl.llm.transformers import LlamaForCausalLM ----> 4 llm = LlamaForCausalLM.from_pretrained("D:/llama/model/ggml-llama2-70b-q4_0.bin", native=True)

File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\transformers\modelling_bigdl.py:124, in _BaseGGMLClass.from_pretrained(cls, pretrained_model_name_or_path, native, dtype, *args, *kwargs) 121 model = cls.HF_Class.from_pretrained(pretrained_model_name_or_path, 122 args, **kwargs) 123 except Exception as e: --> 124 invalidInputError( 125 False, 126 f"Could not load model from path: {pretrained_model_name_or_path}. " 127 f"Please make sure the CausalLM class matches " 128 "the model you want to load." 129 f"Received error {e}" 130 ) 131 return model

File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\utils\common\log4Error.py:32, in invalidInputError(condition, errMsg, fixMsg) 30 if not condition: 31 outputUserMessage(errMsg, fixMsg) ---> 32 raise RuntimeError(errMsg)

RuntimeError: Could not load model from path: D:/llama/model/ggml-llama2-70b-q4_0.bin. Please make sure the CausalLM class matches the model you want to load.Received error The attribute ctx of Llama object is

sgwhat commented 12 months ago

Hi Yang Dong,

Currently, the LlamaForCausalLM API does not support the llama2-70b, but it is compatible with other llama family models. You may refer to the following script to run llama2-70b model with Native INT4 format using the BigDL-LLM CLI tool:

llm-cli -t 16 -x llama -m "ggml-llama2-70b-q4_0.bin" -p PROMPT

dongmovidius commented 12 months ago

Hi Song Ge, Should I build llm-cli from scratch or I can copy from somewhere. It looks llm-cli is NOT avoidable. I can run llm-convert as below

(bigdl_llm)` D:\bigdl-llm-tutorial-main>llm-convert
usage: llm-convert [-h] -o OUTFILE -x MODEL_FAMILY -f MODEL_FORMAT [-t OUTTYPE] [-p TMP_PATH] [-k TOKENIZER_PATH]
                   model
llm-convert: error: the following arguments are required: model, -o/--outfile, -x/--model-family, -f/--model-format

(bigdl_llm) D:\bigdl-llm-tutorial-main>llm-cli
'llm-cli' is not recognized as an internal or external command,
operable program or batch file.

rnwang04 commented 11 months ago

@dongmovidius do you run this cmd in windows command prompt? llm-cli should be running in Anaconda powershell prompt.

MeouSker77 commented 11 months ago

sorry for the inconvenience, but LLAMA-2-70B native int4 requires a environment variable LLAMA_GQA, you must set it to 8 to run LLAMA-2-70B, and set it to 1 or unset it to run other LLAMA family models.

In powershell: use $env:LLAMA_GQA=8
In cmd, use set LLAMA_GQA=8
In bash: use export LLAMA_GQA=8

hkvision commented 11 months ago

@dongmovidius Any follow-up on this? Is this issue resolved?

intel-analytics / ipex-llm