Closed olegmikul closed 3 months ago
Hi @olegmikul, To resolve the Chatbot issue, you'll need to install an additional requirements.txt file located at intel_extension_for_transformers/neural_chat/requirements_cpu.txt before running the chatbot.
For the INT4 Inference issue, please execute pip install intel-extension-for-transformers
or perform a source code installation using pip install -e .
within the intel_extension_for_transformers
directory.
hi, @lvliang-intel,
Thanks, it is partially helps:
I. Chatbot
AssertionError Traceback (most recent call last) Cell In[12], line 1 ----> 1 model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
File ~/py3p10_itrex/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py:173, in _BaseQBitsAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs) 170 from intel_extension_for_transformers.llm.runtime.graph import Model 172 model = Model() --> 173 model.init( 174 pretrained_model_name_or_path, 175 weight_dtype=quantization_config.weight_dtype, 176 alg=quantization_config.scheme, 177 group_size=quantization_config.group_size, 178 scale_dtype=quantization_config.scale_dtype, 179 compute_dtype=quantization_config.compute_dtype, 180 use_ggml=quantization_config.use_ggml, 181 use_quant=quantization_config.use_quant, 182 use_gptq=quantization_config.use_gptq, 183 ) 184 return model 185 else:
File ~/py3p10_itrex/lib/python3.10/site-packages/intel_extension_for_transformers/llm/runtime/graph/init.py:118, in Model.init(self, model_name, use_quant, use_gptq, **quant_kwargs) 116 if not os.path.exists(fp32_bin): 117 convert_model(model_name, fp32_bin, "f32") --> 118 assert os.path.exists(fp32_bin), "Fail to convert pytorch model" 120 if not use_quant: 121 print("FP32 model will be used.")
AssertionError: Fail to convert pytorch model ...
I have just tried the "INT4 Inference (CPU only)" example. It seems that:
if it is the first run (no runtime_outs/ne_mistral_q_nf4_jblas_cfp32_g32.bin generated). the model name ("Intel/neural-chat-7b-v3-1") wont works, I need to pass the model path (something like: .cache/huggingface/hub/models--Intel--neural-chat-7b-v3-1/snapshots/6dbd30b1d5720fde2beb0122084286d887d24b40).
in the later runs, the model_name works ok.
I wandor if this is supposed behavior.
I have just tried the "INT4 Inference (CPU only)" example. It seems that:
if it is the first run (no runtime_outs/ne_mistral_q_nf4_jblas_cfp32_g32.bin generated). the model name ("Intel/neural-chat-7b-v3-1") wont works, I need to pass the model path (something like: .cache/huggingface/hub/models--Intel--neural-chat-7b-v3-1/snapshots/6dbd30b1d5720fde2beb0122084286d887d24b40).
in the later runs, the model_name works ok.
I wandor if this is supposed behavior.
Yes, for Intel/neural-chat-7b-v3-1, we need first download the model to disk, then pass local path to us. and only llama/ mistral/ neural chat model need do this process. other model should be ok to just fill with HF model id.
And we will support them without use local path soon.
Hi, @Tuanshu,
Thanks, it works! Read a poem on a little girl that can see :)
@a32543254 , @lvliang-intel
It would be extremely useful to put necessary details in a README file to avoid questions from newcomers, like me.
Chatbot issues are remaining, though...
Same errors on 3 different Linux distros.
I have installed from source: pushd intel-extension-for-transformers/ pip install -r requirements.txt python setup.py install
Then start to try examples from README (obviously, my first steps after install):
And finally got the following error: from intel_extension_for_transformers.neural_chat import build_chatbot PydanticImportError:
BaseSettings
has been moved to thepydantic-settings
package. See https://docs.pydantic.dev/2.5/migration/#basesettings-has-moved-to-pydantic-settings for more details.from transformers import AutoTokenizer, TextStreamer from intel_extension_for_transformers.transformers import AutoModelForCausalLM model_name = "Intel/neural-chat-7b-v3-1" # Hugging Face model_id or local model prompt = "Once upon a time, there existed a little girl,"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) inputs = tokenizer(prompt, return_tensors="pt").input_ids streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True) outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
ModuleNotFoundError: No module named 'intel_extension_for_transformers.llm.runtime.graph.mistral_cpp'