Open liu-shaojun opened 1 year ago
On spr-01, I followed all-in-one to run bigdl-llm on Intel CPU with python3.10.
conda create --name ziteng-310 python=3.10
conda activate ziteng-310
pip install omegaconf
pip install pandas
pip install --pre --upgrade bigdl-llm[all]
pip install bigdl-nano[pytorch]
source bigdl-nano-init
By running the above command, I got some conda environment configuration which are as follows
python 3.10.13
bigdl-llm 2.4.0b20231024
bigdl-nano 2.3.0
I tested the models chatglm2-6b
, Llama-2-7b-chat-hf
, Baichuan2-7B-Chat
, Llama-2-13b-chat-hf
and mpt-7b-chat
. Except for the chatglm2-6b
model, all other models are fine to run bigdl-llm on cpu with python3.10.
When running bigdl-llm on cpu with chatglm2-6b, I got the following problem
Traceback (most recent call last):
File "/root/ziteng/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 552, in <module>
run_model(model, api, conf['in_out_pairs'], conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'], conf['low_bit'])
File "/root/ziteng/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 45, in run_model
result = run_transformer_int4(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, low_bit)
File "/root/ziteng/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 144, in run_transformer_int4
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_low_bit=low_bit, trust_remote_code=True,
File "/root/anaconda3/envs/ziteng-310/lib/python3.10/site-packages/bigdl/llm/transformers/model.py", line 97, in from_pretrained
model = cls.load_convert(q_k, optimize_model, *args, **kwargs)
File "/root/anaconda3/envs/ziteng-310/lib/python3.10/site-packages/bigdl/llm/transformers/model.py", line 120, in load_convert
model = cls.HF_Model.from_pretrained(*args, **kwargs)
File "/root/anaconda3/envs/ziteng-310/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 496, in from_pretrained
raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.chatglm2-6b.configuration_chatglm.ChatGLMConfig'> for this kind of AutoModel: AutoModelForCausalLM.
Model type should be one of BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, CodeGenConfig, CpmAntConfig, CTRLConfig, Data2VecTextConfig, ElectraConfig, ErnieConfig, FalconConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, LlamaConfig, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MusicgenConfig, MvpConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, TransfoXLConfig, TrOCRConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig.
Running on version 3.9 also encountered this issue. It seemed that this was a problem with the model itself. A related problem was found in Chatglm's issue. https://github.com/THUDM/ChatGLM-6B/issues/37#issuecomment-1704036282 The trainer of huggingface tends to save only the model rather than both model with tokenizer.
Aside from this issue, There is no other problem running bigdl-llm on cpu with python 3.10.
It runs ok when I load THUDM/chatglm2-6b
from Huggingface.
For local model, please refer to https://github.com/THUDM/ChatGLM-6B/issues/37#issuecomment-1704036282
on Arc05, I followed https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md to run mpt-7b-chat with BigDL-LLM on Intel GPUs, got the following error:
(llm-py310) arda@arda-arc05:~/shaojun/BigDL/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt$ python ./generate.py --repo-id-or-model-path /mnt/disk1/models/mpt-7b-chat/
/opt/anaconda3/envs/llm-py310/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
Instantiating an MPTForCausalLM model from /home/arda/.cache/huggingface/modules/transformers_modules/modeling_mpt.py
You are using config.init_device='cpu', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.67s/it]
2023-10-30 07:27:39,352 - bigdl.llm.transformers.utils - INFO - Converting the current model to sym_int4 format......
/opt/anaconda3/envs/llm-py310/lib/python3.10/site-packages/transformers/generation/utils.py:1421: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration )
warnings.warn(
Traceback (most recent call last):
File "/home/arda/shaojun/BigDL/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/./generate.py", line 80, in <module>
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
File "/opt/anaconda3/envs/llm-py310/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3754, in decode
return self._decode(
File "/opt/anaconda3/envs/llm-py310/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 593, in _decode
text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
OverflowError: out of range integral type conversion attempted
Running on version 3.9 also encountered this issue. Solution: https://github.com/analytics-zoo/nano/issues/661
The following models had been tested for python3.10 on Arc05/xpu, the output is as expected as described in https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/:
Llama-2-7b-chat-hf meta-llama/Llama-2-13b-chat-hf BAAI/AquilaChat-7B baichuan-inc/Baichuan-13B-Chat baichuan-inc/Baichuan2-7B-Chat THUDM/chatglm2-6b LinkSoul/Chinese-Llama-2-7b databricks/dolly-v1-6b google/flan-t5-xxl mistralai/Mistral-7B-Instruct-v0.1 internlm/internlm-chat-7b-8k
The following models had been tested for python3.11 on Arc13/xpu, the output is as expected as described in https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/:
baichuan-inc/Baichuan2-13B-Chat
meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-13b-chat-hf
Qwen/Qwen-7B-Chat
Qwen/Qwen-14B-Chat
Qwen/Qwen-VL-Chat
databricks/dolly-v1-6b
databricks/dolly-v2-12b
databricks/dolly-v2-7b
internlm/internlm-chat-20b
internlm/internlm-chat-7b-8k
THUDM/chatglm2-6b
Currently Python 3.12 is not supported for bigdl-llm since the dependency intel_extension_for_pytorch==2.0.110+xpu only supports up to python 3.11.
script
to test on python 3.11
#!/bin/bash
source bigdl-llm-init
export OMP_NUM_THREADS=48
numactl -C 0-47 -m 0 python $(dirname "$0")/run.py
Issue1 on xpu with python 3.10 [Fixed after releasing bigdl-core-xe and bigdl-core-xe-esimd for python 3.10]
on Arc14, I followed https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md to run example with BigDL-LLM on Intel GPUs
bigdl-llm==2.4.0b20230810 is installed, it is weird that bigdl-core-xe and bigdl-core-xe-esimd are missing when python=3.10.
Then set env and run generate.py
I got the following error
After
pip install protobuf
andpython ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH
I got the following errorThere is a transformers issue https://github.com/huggingface/transformers/issues/26755 related to this error, should we use a fixed specific transformers version?
Solution: https://github.com/intel-analytics/llm.cpp/pull/128