intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Apache License 2.0
2.12k stars 209 forks source link

None of examples on README page works #1117

Closed olegmikul closed 3 months ago

olegmikul commented 8 months ago

Same errors on 3 different Linux distros.

I have installed from source: pushd intel-extension-for-transformers/ pip install -r requirements.txt python setup.py install

Then start to try examples from README (obviously, my first steps after install):

  1. Chatbot - a lot of missing dependencies, figured names running from errors and installed one by one pip install uvicorn pip install yacs pip install fastapi pip install shortuuid pip install python-multipart pip install python-dotenv

And finally got the following error: from intel_extension_for_transformers.neural_chat import build_chatbot PydanticImportError: BaseSettings has been moved to the pydantic-settings package. See https://docs.pydantic.dev/2.5/migration/#basesettings-has-moved-to-pydantic-settings for more details.

  1. INT4 Inference (CPU only)

from transformers import AutoTokenizer, TextStreamer from intel_extension_for_transformers.transformers import AutoModelForCausalLM model_name = "Intel/neural-chat-7b-v3-1" # Hugging Face model_id or local model prompt = "Once upon a time, there existed a little girl,"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) inputs = tokenizer(prompt, return_tensors="pt").input_ids streamer = TextStreamer(tokenizer)

model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True) outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)

ModuleNotFoundError: No module named 'intel_extension_for_transformers.llm.runtime.graph.mistral_cpp'

  1. INT8 Inference (CPU only) - same error
lvliang-intel commented 8 months ago

Hi @olegmikul, To resolve the Chatbot issue, you'll need to install an additional requirements.txt file located at intel_extension_for_transformers/neural_chat/requirements_cpu.txt before running the chatbot.

For the INT4 Inference issue, please execute pip install intel-extension-for-transformers or perform a source code installation using pip install -e . within the intel_extension_for_transformers directory.

olegmikul commented 8 months ago

hi, @lvliang-intel,

Thanks, it is partially helps:

I. Chatbot

  1. On my Linux (Archlinux) system with GPU and CUDA chatbot works (I need to install both requirements and requirements_cpu) to make it working
  2. On my another Linux system (same Archlinux OS) without GPU/CUDA chatbot doesn't work: ... _In [4]: chatbot = buildchatbot() 2024-01-09 23:09:10 [ERROR] neuralchat error: System has run out of storage
  3. On my laptop (Ultra7 155H, meteor lake, linux, Ubunta & Archlinux) it doesn't work (and Yes, I've installed intel-extension-for-transformers by both ways): In [4]: chatbot = build_chatbot() Loading model Intel/neural-chat-7b-v3-1 model.safetensors.index.json: 100%|████████| 25.1k/25.1k [00:00<00:00, 77.8MB/s] model-00001-of-00002.safetensors: 100%|█████| 9.94G/9.94G [01:33<00:00, 106MB/s] model-00002-of-00002.safetensors: 100%|████| 4.54G/4.54G [00:55<00:00, 81.5MB/s] Downloading shards: 100%|█████████████████████████| 2/2 [02:29<00:00, 74.80s/it] Loading checkpoint shards: 100%|██████████████████| 2/2 [00:03<00:00, 1.91s/it] generation_config.json: 100%|███████████████████| 111/111 [00:00<00:00, 753kB/s] 2024-01-09 20:04:11 [ERROR] neuralchat error: Generic error ...

II. Inference int* same error everywhere: ... FileNotFoundError: [Errno 2] No such file or directory: 'Intel/neural-chat-7b-v3-1'

AssertionError Traceback (most recent call last) Cell In[12], line 1 ----> 1 model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)

File ~/py3p10_itrex/lib/python3.10/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py:173, in _BaseQBitsAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs) 170 from intel_extension_for_transformers.llm.runtime.graph import Model 172 model = Model() --> 173 model.init( 174 pretrained_model_name_or_path, 175 weight_dtype=quantization_config.weight_dtype, 176 alg=quantization_config.scheme, 177 group_size=quantization_config.group_size, 178 scale_dtype=quantization_config.scale_dtype, 179 compute_dtype=quantization_config.compute_dtype, 180 use_ggml=quantization_config.use_ggml, 181 use_quant=quantization_config.use_quant, 182 use_gptq=quantization_config.use_gptq, 183 ) 184 return model 185 else:

File ~/py3p10_itrex/lib/python3.10/site-packages/intel_extension_for_transformers/llm/runtime/graph/init.py:118, in Model.init(self, model_name, use_quant, use_gptq, **quant_kwargs) 116 if not os.path.exists(fp32_bin): 117 convert_model(model_name, fp32_bin, "f32") --> 118 assert os.path.exists(fp32_bin), "Fail to convert pytorch model" 120 if not use_quant: 121 print("FP32 model will be used.")

AssertionError: Fail to convert pytorch model ...

Tuanshu commented 8 months ago

I have just tried the "INT4 Inference (CPU only)" example. It seems that:

if it is the first run (no runtime_outs/ne_mistral_q_nf4_jblas_cfp32_g32.bin generated). the model name ("Intel/neural-chat-7b-v3-1") wont works, I need to pass the model path (something like: .cache/huggingface/hub/models--Intel--neural-chat-7b-v3-1/snapshots/6dbd30b1d5720fde2beb0122084286d887d24b40).

in the later runs, the model_name works ok.

I wandor if this is supposed behavior.

a32543254 commented 8 months ago

I have just tried the "INT4 Inference (CPU only)" example. It seems that:

if it is the first run (no runtime_outs/ne_mistral_q_nf4_jblas_cfp32_g32.bin generated). the model name ("Intel/neural-chat-7b-v3-1") wont works, I need to pass the model path (something like: .cache/huggingface/hub/models--Intel--neural-chat-7b-v3-1/snapshots/6dbd30b1d5720fde2beb0122084286d887d24b40).

in the later runs, the model_name works ok.

I wandor if this is supposed behavior.

Yes, for Intel/neural-chat-7b-v3-1, we need first download the model to disk, then pass local path to us. and only llama/ mistral/ neural chat model need do this process. other model should be ok to just fill with HF model id.

And we will support them without use local path soon.

olegmikul commented 8 months ago

Hi, @Tuanshu,

Thanks, it works! Read a poem on a little girl that can see :)

@a32543254 , @lvliang-intel

It would be extremely useful to put necessary details in a README file to avoid questions from newcomers, like me.

Chatbot issues are remaining, though...