intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.76k stars 1.27k forks source link

[Chronos] `bigdl.chronos.forecaster.tcn_forecaster.optimize` #6946

Open smurf-1119 opened 1 year ago

smurf-1119 commented 1 year ago

When I run the bigdl.chronos.forecaster.tcn_forecaster.optimize, I encountered some errors as follows:

==========================Start Optimization========================== ----------Start test original model (1/11)---------- ----------Finish test original model (1/11)---------- ----------Start test bf16 model (2/11)---------- Traceback (most recent call last): File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize func_test, acce_model, input_sample) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper func(args) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test model(input_sample) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward outputs = self.forward_step(inputs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/amp/bfloat16.py", line 110, in forward_step return self.model(inputs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/lightning.py", line 99, in forward return self.model(args) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/pytorch/model_wrapper/normalization.py", line 35, in forward y = self.model(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 142, in forward y = self.tcn(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 100, in forward out = self.net(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 179, in forward self.eps, File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2439, in batch_norm input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: !needs_dynamic_casting::check(iter) INTERNAL ASSERT FAILED at "../aten/src/ATen/native/cpu/Loops.h":315, please report a bug to PyTorch. ----------bf16 failed to forward---------- ----------Start test int8 model (3/11)---------- ----------Finish test int8 model (3/11)---------- ----------Start test jit_fp32_ipex model (4/11)---------- ----------Start test jit_fp32_ipex_channels_last model (5/11)---------- Traceback (most recent call last): File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize func_test, acce_model, input_sample) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper func(args) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test model(input_sample) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward outputs = self.forward_step(inputs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs)) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs)) RuntimeError: required rank 4 tensor to use channels_last format ----------jit_fp32_ipex_channels_last failed to forward---------- ----------Start test jit_bf16_ipex model (6/11)---------- [W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator()) ----------Finish test jit_bf16_ipex model (6/11)---------- ----------Start test jit_bf16_ipex_channels_last model (7/11)---------- Traceback (most recent call last): File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize func_test, acce_model, input_sample) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper func(args) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test model(input_sample) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward outputs = self.forward_step(inputs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs)) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs)) RuntimeError: required rank 4 tensor to use channels_last format ----------jit_bf16_ipex_channels_last failed to forward---------- ----------Start test openvino_fp32 model (8/11)---------- [ SUCCESS ] Generated IR version 11 model. [ SUCCESS ] XML file: /tmp/tmpsj7qrq7a/tmp.xml [ SUCCESS ] BIN file: /tmp/tmpsj7qrq7a/tmp.bin [ SUCCESS ] Total execution time: 0.68 seconds. [ SUCCESS ] Memory consumed: 83 MB. ----------Finish test openvino_fp32 model (8/11)---------- ----------Start test openvino_int8 model (9/11)---------- [ SUCCESS ] Generated IR version 11 model. [ SUCCESS ] XML file: /tmp/tmpw9j21xnv/tmp.xml [ SUCCESS ] BIN file: /tmp/tmpw9j21xnv/tmp.bin [ SUCCESS ] Total execution time: 0.66 seconds. [ SUCCESS ] Memory consumed: 83 MB. ----------Finish test openvino_int8 model (9/11)---------- ----------Start test onnxruntime_fp32 model (10/11)---------- ----------Finish test onnxruntime_fp32 model (10/11)---------- ----------Start test onnxruntime_int8_qlinear model (11/11)---------- ----------Finish test onnxruntime_int8_qlinear model (11/11)----------

==========================Optimization Results==========================


method status latency(ms) accuracy
original successful 0.786 0.021
bf16 fail to forward None None
int8 successful 1.394 0.021
jit_fp32_ipex early stopped 26.987 None
jit_fp32_ipex_channels_last fail to forward None None
jit_bf16_ipex successful 0.407 0.021
jit_bf16_ipex_channels_last fail to forward None None
openvino_fp32 successful 0.185 not recomputed
openvino_int8 successful 0.184 0.272
onnxruntime_fp32 successful 0.076 not recomputed
onnxruntime_int8_qlinear successful 0.109 0.022

Optimization cost 21.8s in total. ===========================Stop Optimization===========================

The code is as follows:

from bigdl.chronos.data.repo_dataset import get_public_dataset
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import numpy as np

tsdata_train, tsdata_val, _ = get_public_dataset(name='nyc_taxi')

stand = StandardScaler()
for tsdata in [tsdata_train, tsdata_val]:
    tsdata.impute()\
          .scale(stand, fit=tsdata is tsdata_train)\
          .roll(lookback=48,horizon=1)

train_data = tsdata_train
val_data = tsdata_val

from bigdl.chronos.forecaster.tcn_forecaster import TCNForecaster

forecaster = TCNForecaster(past_seq_len=48,
                           future_seq_len=1,
                           input_feature_num=1,
                           output_feature_num=1,
                           lr=0.001)
print(forecaster.num_processes)
forecaster.num_processes = 1
forecaster.fit(train_data, epochs=3, batch_size=32)
forecaster.optimize(train_data, val_data, thread_num=1)

# outputs = forecaster.predict(tsdata_val)
# gt = tsdata_val.to_numpy()[1]
# print(np.sum((gt - outputs)**2)/len(gt))

# pred_unscale = tsdata_val.unscale_numpy(gt)
# groundtruth_unscale = tsdata_val.unscale_numpy(outputs)

# plt.figure(figsize=(24,6))
# plt.plot(pred_unscale[:,:,0])
# plt.plot(groundtruth_unscale[:,:,0])
# plt.legend(["prediction", "ground truth"])
# plt.savefig(f'/disk3/qp/tcn_multiprocessing/img')
TheaperDeng commented 1 year ago

only jit_fp32_ipex seems to be a problem, will have a look