Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.55k
stars
1.25k
forks
source link
Need upgrade transformers to >=v4.35 to fix the issue "No module named 'transformers.modeling_attn_mask_utils" #10439
2024-03-16 11:17:44,339 - INFO - Converting the current model to bf16 format......
2024-03-16 11:17:44,339 - INFO - BIGDL_OPT_IPEX: True
Traceback (most recent call last):
File "/home/llm/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 1365, in
run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
File "/home/llm/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 96, in run_model
result = run_bigdl_ipex_bf16(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, batch_size)
File "/home/llm/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 1108, in run_bigdl_ipex_bf16
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_low_bit='bf16', trust_remote_code=True, torch_dtype=torch.bfloat16,
File "/home/llm/miniconda3/envs/bigdl-cpu/lib/python3.9/site-packages/bigdl/llm/transformers/model.py", line 304, in from_pretrained
model = cls.load_convert(q_k, optimize_model, *args, **kwargs)
File "/home/llm/miniconda3/envs/bigdl-cpu/lib/python3.9/site-packages/bigdl/llm/transformers/model.py", line 425, in load_convert
model = ggml_convert_low_bit(model, qtype, optimize_model,
File "/home/llm/miniconda3/envs/bigdl-cpu/lib/python3.9/site-packages/bigdl/llm/transformers/convert.py", line 655, in ggml_convert_low_bit
model = _optimize_ipex(model, qtype)
File "/home/llm/miniconda3/envs/bigdl-cpu/lib/python3.9/site-packages/bigdl/llm/transformers/convert.py", line 733, in _optimize_ipex
from transformers.modeling_attn_mask_utils import AttentionMaskConverter
ModuleNotFoundError: No module named 'transformers.modeling_attn_mask_utils'
2024-03-16 11:17:44,339 - INFO - Converting the current model to bf16 format...... 2024-03-16 11:17:44,339 - INFO - BIGDL_OPT_IPEX: True Traceback (most recent call last): File "/home/llm/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 1365, in
run_model(model, api, in_out_pairs, conf['local_model_hub'], conf['warm_up'], conf['num_trials'], conf['num_beams'],
File "/home/llm/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 96, in run_model
result = run_bigdl_ipex_bf16(repo_id, local_model_hub, in_out_pairs, warm_up, num_trials, num_beams, batch_size)
File "/home/llm/BigDL/python/llm/dev/benchmark/all-in-one/./run.py", line 1108, in run_bigdl_ipex_bf16
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_low_bit='bf16', trust_remote_code=True, torch_dtype=torch.bfloat16,
File "/home/llm/miniconda3/envs/bigdl-cpu/lib/python3.9/site-packages/bigdl/llm/transformers/model.py", line 304, in from_pretrained
model = cls.load_convert(q_k, optimize_model, *args, **kwargs)
File "/home/llm/miniconda3/envs/bigdl-cpu/lib/python3.9/site-packages/bigdl/llm/transformers/model.py", line 425, in load_convert
model = ggml_convert_low_bit(model, qtype, optimize_model,
File "/home/llm/miniconda3/envs/bigdl-cpu/lib/python3.9/site-packages/bigdl/llm/transformers/convert.py", line 655, in ggml_convert_low_bit
model = _optimize_ipex(model, qtype)
File "/home/llm/miniconda3/envs/bigdl-cpu/lib/python3.9/site-packages/bigdl/llm/transformers/convert.py", line 733, in _optimize_ipex
from transformers.modeling_attn_mask_utils import AttentionMaskConverter
ModuleNotFoundError: No module named 'transformers.modeling_attn_mask_utils'