Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
When I run the bigdl.chronos.forecaster.tcn_forecaster.optimize, I encountered some errors as follows:
==========================Start Optimization==========================
----------Start test original model (1/11)----------
----------Finish test original model (1/11)----------
----------Start test bf16 model (2/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func(args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/amp/bfloat16.py", line 110, in forward_step
return self.model(inputs)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/lightning.py", line 99, in forward
return self.model(args)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/pytorch/model_wrapper/normalization.py", line 35, in forward
y = self.model(x)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 142, in forward
y = self.tcn(x)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 100, in forward
out = self.net(x)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 179, in forward
self.eps,
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2439, in batch_norm
input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: !needs_dynamic_casting::check(iter) INTERNAL ASSERT FAILED at "../aten/src/ATen/native/cpu/Loops.h":315, please report a bug to PyTorch.
----------bf16 failed to forward----------
----------Start test int8 model (3/11)----------
----------Finish test int8 model (3/11)----------
----------Start test jit_fp32_ipex model (4/11)----------
----------Start test jit_fp32_ipex_channels_last model (5/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func(args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
RuntimeError: required rank 4 tensor to use channels_last format
----------jit_fp32_ipex_channels_last failed to forward----------
----------Start test jit_bf16_ipex model (6/11)----------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
----------Finish test jit_bf16_ipex model (6/11)----------
----------Start test jit_bf16_ipex_channels_last model (7/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func(args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
RuntimeError: required rank 4 tensor to use channels_last format
----------jit_bf16_ipex_channels_last failed to forward----------
----------Start test openvino_fp32 model (8/11)----------
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/tmpsj7qrq7a/tmp.xml
[ SUCCESS ] BIN file: /tmp/tmpsj7qrq7a/tmp.bin
[ SUCCESS ] Total execution time: 0.68 seconds.
[ SUCCESS ] Memory consumed: 83 MB.
----------Finish test openvino_fp32 model (8/11)----------
----------Start test openvino_int8 model (9/11)----------
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/tmpw9j21xnv/tmp.xml
[ SUCCESS ] BIN file: /tmp/tmpw9j21xnv/tmp.bin
[ SUCCESS ] Total execution time: 0.66 seconds.
[ SUCCESS ] Memory consumed: 83 MB.
----------Finish test openvino_int8 model (9/11)----------
----------Start test onnxruntime_fp32 model (10/11)----------
----------Finish test onnxruntime_fp32 model (10/11)----------
----------Start test onnxruntime_int8_qlinear model (11/11)----------
----------Finish test onnxruntime_int8_qlinear model (11/11)----------
When I run the
bigdl.chronos.forecaster.tcn_forecaster.optimize
, I encountered some errors as follows:==========================Start Optimization========================== ----------Start test original model (1/11)---------- ----------Finish test original model (1/11)---------- ----------Start test bf16 model (2/11)---------- Traceback (most recent call last): File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize func_test, acce_model, input_sample) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper func(args) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test model(input_sample) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward outputs = self.forward_step(inputs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/amp/bfloat16.py", line 110, in forward_step return self.model(inputs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/lightning.py", line 99, in forward return self.model(args) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/pytorch/model_wrapper/normalization.py", line 35, in forward y = self.model(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 142, in forward y = self.tcn(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 100, in forward out = self.net(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 179, in forward self.eps, File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2439, in batch_norm input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: !needs_dynamic_casting::check(iter) INTERNAL ASSERT FAILED at "../aten/src/ATen/native/cpu/Loops.h":315, please report a bug to PyTorch.
----------bf16 failed to forward----------
----------Start test int8 model (3/11)----------
----------Finish test int8 model (3/11)----------
----------Start test jit_fp32_ipex model (4/11)----------
----------Start test jit_fp32_ipex_channels_last model (5/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func(args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
RuntimeError: required rank 4 tensor to use channels_last format
----------jit_fp32_ipex_channels_last failed to forward----------
----------Start test jit_bf16_ipex model (6/11)----------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
----------Finish test jit_bf16_ipex model (6/11)----------
----------Start test jit_bf16_ipex_channels_last model (7/11)----------
Traceback (most recent call last):
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize
func_test, acce_model, input_sample)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper
func( args)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test
model(input_sample)
File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward
outputs = self.forward_step(inputs)
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in
inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs))
RuntimeError: required rank 4 tensor to use channels_last format
----------jit_bf16_ipex_channels_last failed to forward----------
----------Start test openvino_fp32 model (8/11)----------
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/tmpsj7qrq7a/tmp.xml
[ SUCCESS ] BIN file: /tmp/tmpsj7qrq7a/tmp.bin
[ SUCCESS ] Total execution time: 0.68 seconds.
[ SUCCESS ] Memory consumed: 83 MB.
----------Finish test openvino_fp32 model (8/11)----------
----------Start test openvino_int8 model (9/11)----------
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /tmp/tmpw9j21xnv/tmp.xml
[ SUCCESS ] BIN file: /tmp/tmpw9j21xnv/tmp.bin
[ SUCCESS ] Total execution time: 0.66 seconds.
[ SUCCESS ] Memory consumed: 83 MB.
----------Finish test openvino_int8 model (9/11)----------
----------Start test onnxruntime_fp32 model (10/11)----------
----------Finish test onnxruntime_fp32 model (10/11)----------
----------Start test onnxruntime_int8_qlinear model (11/11)----------
----------Finish test onnxruntime_int8_qlinear model (11/11)----------
==========================Optimization Results==========================
Optimization cost 21.8s in total. ===========================Stop Optimization===========================
The code is as follows: