intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.49k stars 1.24k forks source link

Runntime error when load native llama2 70b INT4 model #9034

Open dongmovidius opened 12 months ago

dongmovidius commented 12 months ago

I'd like to run llama2 on CPU

run below code to load llama2 70b native int4 model

from bigdl.llm.transformers import LlamaForCausalLM
llm = LlamaForCausalLM.from_pretrained("D:/llama/model/ggml-llama2-70b-q4_0.bin", native=True) 

2023-09-22 16:32:23,278 - ERROR -

****Usage Error**** The attribute ctx of Llama object is None. 2023-09-22 16:32:23,279 - ERROR -

****Call Stack***** 2023-09-22 16:32:23,279 - ERROR -

****Usage Error**** Could not load model from path: D:/llama/model/ggml-llama2-70b-q4_0.bin. Please make sure the CausalLM class matches the model you want to load.Received error The attribute ctx of Llama object is None. 2023-09-22 16:32:23,280 - ERROR -

****Call Stack*****

RuntimeError Traceback (most recent call last) File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\transformers\modelling_bigdl.py:119, in _BaseGGMLClass.from_pretrained(cls, pretrained_model_name_or_path, native, dtype, *args, kwargs) 118 ggml_model_path = pretrained_model_name_orpath --> 119 model = class(model_path=ggml_model_path, kwargs) 120 else:

File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\ggml\model\llama\llama.py:211, in Llama.init(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, verbose) 207 self.ctx = llama_cpp.llama_init_from_file( 208 self.model_path.encode("utf-8"), self.params 209 ) --> 211 invalidInputError(self.ctx is not None, "The attribute ctx of Llama object is None.") 213 if self.lora_path:

File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\utils\common\log4Error.py:32, in invalidInputError(condition, errMsg, fixMsg) 31 outputUserMessage(errMsg, fixMsg) ---> 32 raise RuntimeError(errMsg)

RuntimeError: The attribute ctx of Llama object is None.

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last) Cell In[4], line 4 1 #load the converted model 2 #switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models 3 from bigdl.llm.transformers import LlamaForCausalLM ----> 4 llm = LlamaForCausalLM.from_pretrained("D:/llama/model/ggml-llama2-70b-q4_0.bin", native=True)

File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\transformers\modelling_bigdl.py:124, in _BaseGGMLClass.from_pretrained(cls, pretrained_model_name_or_path, native, dtype, *args, *kwargs) 121 model = cls.HF_Class.from_pretrained(pretrained_model_name_or_path, 122 args, **kwargs) 123 except Exception as e: --> 124 invalidInputError( 125 False, 126 f"Could not load model from path: {pretrained_model_name_or_path}. " 127 f"Please make sure the CausalLM class matches " 128 "the model you want to load." 129 f"Received error {e}" 130 ) 131 return model

File D:\Program\Anaconda3\envs\bigdl_llm\lib\site-packages\bigdl\llm\utils\common\log4Error.py:32, in invalidInputError(condition, errMsg, fixMsg) 30 if not condition: 31 outputUserMessage(errMsg, fixMsg) ---> 32 raise RuntimeError(errMsg)

RuntimeError: Could not load model from path: D:/llama/model/ggml-llama2-70b-q4_0.bin. Please make sure the CausalLM class matches the model you want to load.Received error The attribute ctx of Llama object is

sgwhat commented 12 months ago

Hi Yang Dong,

Currently, the LlamaForCausalLM API does not support the llama2-70b, but it is compatible with other llama family models. You may refer to the following script to run llama2-70b model with Native INT4 format using the BigDL-LLM CLI tool:

llm-cli -t 16 -x llama -m "ggml-llama2-70b-q4_0.bin" -p PROMPT
dongmovidius commented 12 months ago

Hi Song Ge, Should I build llm-cli from scratch or I can copy from somewhere. It looks llm-cli is NOT avoidable. I can run llm-convert as below

(bigdl_llm)` D:\bigdl-llm-tutorial-main>llm-convert
usage: llm-convert [-h] -o OUTFILE -x MODEL_FAMILY -f MODEL_FORMAT [-t OUTTYPE] [-p TMP_PATH] [-k TOKENIZER_PATH]
                   model
llm-convert: error: the following arguments are required: model, -o/--outfile, -x/--model-family, -f/--model-format

(bigdl_llm) D:\bigdl-llm-tutorial-main>llm-cli
'llm-cli' is not recognized as an internal or external command,
operable program or batch file.
rnwang04 commented 11 months ago

@dongmovidius do you run this cmd in windows command prompt? llm-cli should be running in Anaconda powershell prompt.

MeouSker77 commented 11 months ago

sorry for the inconvenience, but LLAMA-2-70B native int4 requires a environment variable LLAMA_GQA, you must set it to 8 to run LLAMA-2-70B, and set it to 1 or unset it to run other LLAMA family models.

hkvision commented 11 months ago

@dongmovidius Any follow-up on this? Is this issue resolved?