Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
When I run the bigdl.chronos.forecaster.tcn_forecaster.optimize, I encountered some errors as follows:
==========================Start Optimization==========================
----------Start test original model (1/11)----------
----------Finish test original model (1/11)----------
----------Start test bf16 model (2/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func(args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/amp/bfloat16.py", line 110, in forward_step
return self.model(inputs)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/lightning.py", line 99, in forward
return self.model(args)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/pytorch/model_wrapper/normalization.py", line 35, in forward
y = self.model(x)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 142, in forward
y = self.tcn(x)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 100, in forward
out = self.net(x)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 179, in forward
self.eps,
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2439, in batch_norm
input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: !needs_dynamic_casting::check(iter) INTERNAL ASSERT FAILED at "../aten/src/ATen/native/cpu/Loops.h":315, please report a bug to PyTorch.
----------bf16 failed to forward----------
----------Start test int8 model (3/11)----------
----------Finish test int8 model (3/11)----------
----------Start test jit_fp32_ipex model (4/11)----------
----------Start test jit_fp32_ipex_channels_last model (5/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func(args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
RuntimeError: required rank 4 tensor to use channels_last format
----------jit_fp32_ipex_channels_last failed to forward----------
----------Start test jit_bf16_ipex model (6/11)----------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
----------Finish test jit_bf16_ipex model (6/11)----------
----------Start test jit_bf16_ipex_channels_last model (7/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func(args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
RuntimeError: required rank 4 tensor to use channels_last format
----------jit_bf16_ipex_channels_last failed to forward----------
----------Start test openvino_fp32 model (8/11)----------
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/tmpsj7qrq7a/tmp.xml
[ SUCCESS ] BIN file: /tmp/tmpsj7qrq7a/tmp.bin
[ SUCCESS ] Total execution time: 0.68 seconds.
[ SUCCESS ] Memory consumed: 83 MB.
----------Finish test openvino_fp32 model (8/11)----------
----------Start test openvino_int8 model (9/11)----------
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/tmpw9j21xnv/tmp.xml
[ SUCCESS ] BIN file: /tmp/tmpw9j21xnv/tmp.bin
[ SUCCESS ] Total execution time: 0.66 seconds.
[ SUCCESS ] Memory consumed: 83 MB.
----------Finish test openvino_int8 model (9/11)----------
----------Start test onnxruntime_fp32 model (10/11)----------
----------Finish test onnxruntime_fp32 model (10/11)----------
----------Start test onnxruntime_int8_qlinear model (11/11)----------
----------Finish test onnxruntime_int8_qlinear model (11/11)----------
When I run the
bigdl.chronos.forecaster.tcn_forecaster.optimize
, I encountered some errors as follows:==========================Start Optimization========================== ----------Start test original model (1/11)---------- ----------Finish test original model (1/11)---------- ----------Start test bf16 model (2/11)---------- Traceback (most recent call last): File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize func_test, acce_model, input_sample) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper func(args) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test model(input_sample) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward outputs = self.forward_step(inputs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/amp/bfloat16.py", line 110, in forward_step return self.model(inputs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/lightning.py", line 99, in forward return self.model(args) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/pytorch/model_wrapper/normalization.py", line 35, in forward y = self.model(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 142, in forward y = self.tcn(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 100, in forward out = self.net(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 179, in forward self.eps, File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2439, in batch_norm input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: !needs_dynamic_casting::check(iter) INTERNAL ASSERT FAILED at "../aten/src/ATen/native/cpu/Loops.h":315, please report a bug to PyTorch.
----------bf16 failed to forward----------
----------Start test int8 model (3/11)----------
----------Finish test int8 model (3/11)----------
----------Start test jit_fp32_ipex model (4/11)----------
----------Start test jit_fp32_ipex_channels_last model (5/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func(args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
RuntimeError: required rank 4 tensor to use channels_last format
----------jit_fp32_ipex_channels_last failed to forward----------
----------Start test jit_bf16_ipex model (6/11)----------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
----------Finish test jit_bf16_ipex model (6/11)----------
----------Start test jit_bf16_ipex_channels_last model (7/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func( args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
RuntimeError: required rank 4 tensor to use channels_last format
----------jit_bf16_ipex_channels_last failed to forward----------
----------Start test openvino_fp32 model (8/11)----------
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/tmpsj7qrq7a/tmp.xml
[ SUCCESS ] BIN file: /tmp/tmpsj7qrq7a/tmp.bin
[ SUCCESS ] Total execution time: 0.68 seconds.
[ SUCCESS ] Memory consumed: 83 MB.
----------Finish test openvino_fp32 model (8/11)----------
----------Start test openvino_int8 model (9/11)----------
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/tmpw9j21xnv/tmp.xml
[ SUCCESS ] BIN file: /tmp/tmpw9j21xnv/tmp.bin
[ SUCCESS ] Total execution time: 0.66 seconds.
[ SUCCESS ] Memory consumed: 83 MB.
----------Finish test openvino_int8 model (9/11)----------
----------Start test onnxruntime_fp32 model (10/11)----------
----------Finish test onnxruntime_fp32 model (10/11)----------
----------Start test onnxruntime_int8_qlinear model (11/11)----------
----------Finish test onnxruntime_int8_qlinear model (11/11)----------
==========================Optimization Results==========================
Optimization cost 21.8s in total. ===========================Stop Optimization===========================
The code is as follows: