Closed cphoward closed 10 months ago
Hi, for this error:
raise ValueError(f"weight_dtype must be 'int4', 'int8'.") ValueError: weight_dtype must be 'int4', 'int8'.
The reason was that 'pip install intel-extension-for-transformers' installed the previous version so that it can't support the latest int4 or int8 quantize feature. That's also one of reasons why you can't get any Transformer-based API examples working.
Please install the ITREX from the source code and have a try~ I have run the whole installation process. It's ok if you follow the commands.
git clone https://github.com/intel/intel-extension-for-transformers
cd intel-extension-for-transformers
pip install -r requirements.txt
pip install transformers==4.33.1
python setup.py install
Please put your script outside of the intel-extension-for-transformers root directory to make sure the script calls API from the conda env you installed rather than local ITREX directories.
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neural-chat-7b-v1-1" # Hugging Face model_id or local model
prompt = "Once upon a time, there existed a little girl,"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
Inference screenshot:
I get
python3.8 -m venv venv
. venv/bin/activate
git clone https://github.com/intel/intel-extension-for-transformers
cd intel-extension-for-transformers
pip install -r requirements.txt
pip install transformers==4.33.1
python setup.py install
error: huggingface-hub 0.19.4 is installed but huggingface_hub<0.18,>=0.16.4 is required by {'tokenizers'}
Attempting to resolve that error with
pip install huggingface-hub==0.17.3
#Output
Collecting huggingface-hub==0.17.3
Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 295.0/295.0 kB 6.1 MB/s eta 0:00:00
Requirement already satisfied: packaging>=20.9 in ./venv/lib/python3.8/site-packages (from huggingface-hub==0.17.3) (23.2)
Requirement already satisfied: requests in ./venv/lib/python3.8/site-packages (from huggingface-hub==0.17.3) (2.31.0)
Requirement already satisfied: fsspec in ./venv/lib/python3.8/site-packages (from huggingface-hub==0.17.3) (2023.10.0)
Requirement already satisfied: filelock in ./venv/lib/python3.8/site-packages (from huggingface-hub==0.17.3) (3.13.1)
Requirement already satisfied: tqdm>=4.42.1 in ./venv/lib/python3.8/site-packages (from huggingface-hub==0.17.3) (4.66.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./venv/lib/python3.8/site-packages (from huggingface-hub==0.17.3) (4.8.0)
Requirement already satisfied: pyyaml>=5.1 in ./venv/lib/python3.8/site-packages (from huggingface-hub==0.17.3) (6.0.1)
Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.8/site-packages (from requests->huggingface-hub==0.17.3) (2023.11.17)
Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.8/site-packages (from requests->huggingface-hub==0.17.3) (3.6)
Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.8/site-packages (from requests->huggingface-hub==0.17.3) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.8/site-packages (from requests->huggingface-hub==0.17.3) (2.1.0)
Installing collected packages: huggingface-hub
Attempting uninstall: huggingface-hub
Found existing installation: huggingface-hub 0.19.4
Uninstalling huggingface-hub-0.19.4:
Successfully uninstalled huggingface-hub-0.19.4
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 2.15.0 requires huggingface-hub>=0.18.0, but you have huggingface-hub 0.17.3 which is incompatible.
intel-extension-for-transformers 1.3rc2.dev30+g318e5cbf22 requires transformers==4.34.1, but you have transformers 4.33.1 which is incompatible.
Successfully installed huggingface-hub-0.17.3
Rerunning:
python setup.py install
results in:
Beginning with Matplotlib 3.8, Python 3.9 or above is required.
You are using Python 3.8.18.
This may be due to an out of date pip.
Make sure you have pip >= 9.0.1.
So retrying with Python 3.9:
sudo apt install python3.9 python3.9-venv python3.9-dev python3-pip;
python3.9 -m venv python3.9-venv
. python3.9-venv/bin/activate
pip install -U pip
pip install -r requirements.txt
pip install transformers==4.33.1
python setup.py install
Complains about:
error: huggingface-hub 0.19.4 is installed but huggingface_hub<0.18,>=0.16.4 is required by {'tokenizers'}
So I install the preferred huggingface-hub version and setup.py install
:
pip install huggingface-hub==0.17.3
python setup.py install
# output
Collecting huggingface-hub==0.17.3
Downloading huggingface_hub-0.17.3-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: filelock in ./python3.9-venv/lib/python3.9/site-packages (from huggingface-hub==0.17.3) (3.13.1)
Requirement already satisfied: fsspec in ./python3.9-venv/lib/python3.9/site-packages (from huggingface-hub==0.17.3) (2023.10.0)
Requirement already satisfied: requests in ./python3.9-venv/lib/python3.9/site-packages (from huggingface-hub==0.17.3) (2.31.0)
Requirement already satisfied: tqdm>=4.42.1 in ./python3.9-venv/lib/python3.9/site-packages (from huggingface-hub==0.17.3) (4.66.1)
Requirement already satisfied: pyyaml>=5.1 in ./python3.9-venv/lib/python3.9/site-packages (from huggingface-hub==0.17.3) (6.0.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./python3.9-venv/lib/python3.9/site-packages (from huggingface-hub==0.17.3) (4.8.0)
Requirement already satisfied: packaging>=20.9 in ./python3.9-venv/lib/python3.9/site-packages (from huggingface-hub==0.17.3) (23.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./python3.9-venv/lib/python3.9/site-packages (from requests->huggingface-hub==0.17.3) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./python3.9-venv/lib/python3.9/site-packages (from requests->huggingface-hub==0.17.3) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./python3.9-venv/lib/python3.9/site-packages (from requests->huggingface-hub==0.17.3) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in ./python3.9-venv/lib/python3.9/site-packages (from requests->huggingface-hub==0.17.3) (2023.11.17)
Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 295.0/295.0 kB 7.0 MB/s eta 0:00:00
Installing collected packages: huggingface-hub
Attempting uninstall: huggingface-hub
Found existing installation: huggingface-hub 0.19.4
Uninstalling huggingface-hub-0.19.4:
Successfully uninstalled huggingface-hub-0.19.4
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 2.15.0 requires huggingface-hub>=0.18.0, but you have huggingface-hub 0.17.3 which is incompatible.
intel-extension-for-transformers 1.3rc2.dev30+g318e5cbf22 requires transformers==4.34.1, but you have transformers 4.33.1 which is incompatible.
Successfully installed huggingface-hub-0.17.3
# Outputs redacted of copious quantities of log lines
Using /home/caseyhoward/intel-extension-for-transformers/python3.9-venv/lib/python3.9/site-packages
Finished processing dependencies for intel-extension-for-transformers==1.3rc2.dev30+g318e5cbf22
Running:
# example.py
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neural-chat-7b-v1-1" # Hugging Face model_id or local model
prompt = "Once upon a time, there existed a little girl,"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
Results in:
Traceback (most recent call last):
# ...
# Stacktrace removed
# ...
File "/home/caseyhoward/intel-extension-for-transformers/python3.9-venv/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 179, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: einops. Run `pip install einops`
Traceback (most recent call last):
File "/home/caseyhoward/ex.py", line 10, in <module>
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True)
File "/home/caseyhoward/intel-extension-for-transformers/python3.9-venv/lib/python3.9/site-packages/intel_extension_for_transformers-1.3rc2.dev30+g318e5cbf22-py3.9-linux-x86_64.egg/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 179, in from_pretrained
model.init(
File "/home/caseyhoward/intel-extension-for-transformers/python3.9-venv/lib/python3.9/site-packages/intel_extension_for_transformers-1.3rc2.dev30+g318e5cbf22-py3.9-linux-x86_64.egg/intel_extension_for_transformers/llm/runtime/graph/__init__.py", line 122, in init
assert os.path.exists(fp32_bin), "Fail to convert pytorch model"
AssertionError: Fail to convert pytorch model
Running
pip install einops
python example.py
Results in:
model_quantize_internal: model size = 25362.62 MB
model_quantize_internal: quant size = 4737.50 MB
ARCH_REQ_XCOMP_PERM XTILE_DATA successful.
AVX:1 AVX2:1 AVX512F:1 AVX_VNNI:1 AVX512_VNNI:1 AMX_INT8:1 AMX_BF16:1 AVX512_BF16:1 AVX512_FP16:1
beam_size: 1, do_sample: 0, top_k: 40, top_p: 0.950000
model.cpp: loading model from runtime_outs/ne_mpt_q_int4_jblas_cint8_g32.bin
init: n_vocab = 50279
init: n_embd = 4096
init: n_mult = 4096
init: n_head = 32
init: n_layer = 32
init: n_rot = 32
init: n_ff = 16384
init: n_parts = 1
load: ne ctx size = 4737.55 MB
load: mem required = 12929.55 MB (+ memory per state)
..................................................................................................
model_init_from_file: support_jblas_kv = 1
model_init_from_file: kv self size = 276.00 MB
Once upon a time, there existed a little girl, who was born in the year of the dragon. She was born in the year of the dragon because her mother was born in the year of the dragon. Her mother was born in the year of the dragon because her grandmother was born in the year of the dragon. Her grandmother was born in the year of the dragon because her grandfather was born in the year of the dragon. Her grandfather was born in the year of the dragon because his father was born in the year of the dragon. His father was born in the year of the dragon because his mother was born in the year of the dragon. Her mother was born in the year of the dragon because her father was born in the year of the dragon. Her father was born in the year of the dragon because his mother was born in the year of the dragon. Her mother was born in the year of the dragon because her father was born in the year of the dragon. Her father was born in the year of the dragon because his mother was born in the year of the dragon. Her mother was born in the year of the dragon because her father was born in the year of the dragon. Her father was born in the year of the dragon because his mother was born in the year of the dragon. Her mother was born in the year of the dragon because her father was born in the year of the dragon. Her father was born in the year of the dragon because his mother was born in the year of the dragon. Her mother was born in
It works.
Based on my experience, I have some thoughts:
intel-extension-for-transformers
in PyPI and other repos (or the docs Install docs to reflect the current build process)einops
, huggingface-hub
, transformers
and datasets
(due to downgrading huggingface-hub
)Might there a way to update the package in PyPI and Conda repos so pip install intel-extension-for-transformers
works? I think it would significantly help with adoption of this software and Intel's AMX capabilities.
Thanks very much for you valuable suggestion! We will update the README, and consider to fix following dependency issue. Regards Bo
I follow exactly what @cphoward did, but I am still encountering an error similar to him/her.
2023-12-15 16:13:10 [INFO] Applying Weight Only Quantization.
2023-12-15 16:13:10 [INFO] Using LLM runtime.
Traceback (most recent call last):
File "/hdd4/namch/example.py", line 21, in <module>
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True, cache_dir='/hdd4/namch/.cache')
File "/hdd4/namch/miniconda3/envs/python3.9/lib/python3.9/site-packages/intel_extension_for_transformers/transformers/modeling/modeling_auto.py", line 136, in from_pretrained
quantization_config.post_init_runtime()
File "/hdd4/namch/miniconda3/envs/python3.9/lib/python3.9/site-packages/intel_extension_for_transformers/transformers/utils/quantization_config.py", line 127, in post_init_runtime
raise ValueError(f"weight_dtype must be 'int4', 'int8'.")
ValueError: weight_dtype must be 'int4', 'int8'.
I am uncertain if my server hardware is not supported. Here is the hardware information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 165
Model name: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
Stepping: 5
CPU MHz: 4900.000
CPU max MHz: 5300.0000
CPU min MHz: 800.0000
BogoMIPS: 7399.70
Virtualization: VT-x
L1d cache: 320 KiB
L1i cache: 320 KiB
L2 cache: 2.5 MiB
L3 cache: 20 MiB
NUMA node0 CPU(s): 0-19
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfm
perf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ib
pb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_a
ct_window hwp_epp pku ospke md_clear flush_l1d arch_capabilities
Thank you so much!
@CaoHaiNam Hi, the current problem may not be your hardware. Please check my above comment in this issue to fix this error.
If your hardward does not support it, the error will be reported when you inference the model. No worries.Please feel free to comment on this issue if you encounter any problems.
Hi, for this error:
raise ValueError(f"weight_dtype must be 'int4', 'int8'.") ValueError: weight_dtype must be 'int4', 'int8'.
The reason was that 'pip install intel-extension-for-transformers' installed the previous version so that it can't support the latest int4 or int8 quantize feature. That's also one of reasons why you can't get any Transformer-based API examples working.
Please install the ITREX from the source code and have a try~ I have run the whole installation process. It's ok if you follow the commands.
git clone https://github.com/intel/intel-extension-for-transformers cd intel-extension-for-transformers pip install -r requirements.txt pip install transformers==4.33.1 python setup.py install
Please put your script outside of the intel-extension-for-transformers root directory to make sure the script calls API from the conda env you installed rather than local ITREX directories.
from transformers import AutoTokenizer, TextStreamer from intel_extension_for_transformers.transformers import AutoModelForCausalLM model_name = "Intel/neural-chat-7b-v1-1" # Hugging Face model_id or local model prompt = "Once upon a time, there existed a little girl," tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) inputs = tokenizer(prompt, return_tensors="pt").input_ids streamer = TextStreamer(tokenizer) model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True) outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
Inference screenshot:
I'm following instructions.
Now I got different error
{ "name": "KeyError", "message": "'mistral'", "stack": "--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[3], line 10 7 inputs = tokenizer(prompt, return_tensors=\"pt\").input_ids 8 streamer = TextStreamer(tokenizer) ---> 10 model = AutoModelForCausalLM.from_pretrained(model_name, load_in_4bit=True) 11 outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300)
File ~/.conda/envs/gptexp/lib/python3.10/site-packages/intel_extension_for_transformers-1.4.dev20+ge6ecb21ce5-py3.10-linux-x86_64.egg/intel_extension_for_transformers/transformers/modeling/modeling_auto.py:265, in _BaseQBitsAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs) 262 from intel_extension_for_transformers.llm.runtime.graph import Model 264 model = Model() --> 265 model.init( 266 pretrained_model_name_or_path, 267 weight_dtype=quantization_config.weight_dtype, 268 alg=quantization_config.scheme, 269 group_size=quantization_config.group_size, 270 scale_dtype=quantization_config.scale_dtype, 271 compute_dtype=quantization_config.compute_dtype, 272 use_ggml=quantization_config.use_ggml, 273 use_quant=quantization_config.use_quant, 274 use_gptq=quantization_config.use_gptq, 275 ) 276 return model 277 else:
File ~/.conda/envs/gptexp/lib/python3.10/site-packages/intel_extension_for_transformers-1.4.dev20+ge6ecb21ce5-py3.10-linux-x86_64.egg/intel_extension_for_transformers/llm/runtime/graph/init.py:79, in Model.init(self, model_name, use_quant, use_gptq, quant_kwargs) 78 def init(self, model_name, use_quant=True, use_gptq=False, quant_kwargs): ---> 79 self.config = AutoConfig.from_pretrained(model_name, trust_remote_code=True) 80 self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) 81 self.model_type = Model.get_model_type(self.config)
File ~/.conda/envs/gptexp/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1039, in AutoConfig.from_pretrained(cls, pretrained_model_name_or_path, kwargs) 1037 return config_class.from_pretrained(pretrained_model_name_or_path, kwargs) 1038 elif \"model_type\" in config_dict: -> 1039 config_class = CONFIG_MAPPING[config_dict[\"model_type\"]] 1040 return config_class.from_dict(config_dict, **unused_kwargs) 1041 else: 1042 # Fallback: use pattern matching on the string. 1043 # We go from longer names to shorter names to catch roberta before bert (for instance)
File ~/.conda/envs/gptexp/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:734, in _LazyConfigMapping.getitem(self, key) 732 return self._extra_content[key] 733 if key not in self._mapping: --> 734 raise KeyError(key) 735 value = self._mapping[key] 736 module_name = model_type_to_module_name(key)
KeyError: 'mistral'" }
Note:I'm using Intel Developer Cloud.
Hi, sorry to reply late because I didn't receive any email notification : (
I can't see the screenshot you shared. Please upload again.
Please provide more details so that I can reproduce your error. The script, the model name / card_id of HF and commands.
@rajivmehtaflex
hi, the reason of this error maybe the transformer version. Please try to update the latest transformer version. That's why KeyError: 'mistral'".
@rajivmehtaflex Hi, This issue has been fixed. I'll close this issue for now. If you have more questions, please feel free to ask and @ me.
The Transformer Python API section is not working. I've tried Python 3.7, 3.8, 3.10 and 3.11.
I am running this with Ubuntu on Intel Sapphire Rapid CPUs.
The code:
results in:
I have been unable to get any Transformer-based API examples working.
I have been able to make the scripts work as per description. So I know my system hardware and kernel configuration do work.
I had originally been trying to get
python_api_example.py
working, but this too is broken:Some other models work, but it's hit and miss. For example model
meta-llama/Llama-2-7b-chat-hf
also fails: