intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.27k stars 1.23k forks source link

[Chronos] `bigdl.chronos.forecaster.tcn_forecaster.optimize` #6946

Open smurf-1119 opened 1 year ago

smurf-1119 commented 1 year ago

When I run the bigdl.chronos.forecaster.tcn_forecaster.optimize, I encountered some errors as follows:

==========================Start Optimization========================== ----------Start test original model (1/11)---------- ----------Finish test original model (1/11)---------- ----------Start test bf16 model (2/11)---------- Traceback (most recent call last): File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize func_test, acce_model, input_sample) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper func(args) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test model(input_sample) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward outputs = self.forward_step(inputs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/amp/bfloat16.py", line 110, in forward_step return self.model(inputs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/lightning.py", line 99, in forward return self.model(args) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/pytorch/model_wrapper/normalization.py", line 35, in forward y = self.model(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 142, in forward y = self.tcn(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/chronos/model/tcn.py", line 100, in forward out = self.net(x) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 179, in forward self.eps, File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/functional.py", line 2439, in batch_norm input, weight, bias, running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: !needs_dynamic_casting::check(iter) INTERNAL ASSERT FAILED at "../aten/src/ATen/native/cpu/Loops.h":315, please report a bug to PyTorch. ----------bf16 failed to forward---------- ----------Start test int8 model (3/11)---------- ----------Finish test int8 model (3/11)---------- ----------Start test jit_fp32_ipex model (4/11)---------- ----------Start test jit_fp32_ipex_channels_last model (5/11)---------- Traceback (most recent call last): File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize func_test, acce_model, input_sample) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper func(args) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test model(input_sample) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward outputs = self.forward_step(inputs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs)) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs)) RuntimeError: required rank 4 tensor to use channels_last format ----------jit_fp32_ipex_channels_last failed to forward---------- ----------Start test jit_bf16_ipex model (6/11)---------- [W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator()) ----------Finish test jit_bf16_ipex model (6/11)---------- ----------Start test jit_bf16_ipex_channels_last model (7/11)---------- Traceback (most recent call last): File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 386, in optimize func_test, acce_model, input_sample) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/common/utils.py", line 68, in throughput_calculate_helper func(args) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/pytorch/inference/optimizer.py", line 378, in func_test model(input_sample) File "/home/cpx/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/utils/inference/pytorch/model.py", line 31, in forward outputs = self.forward_step(inputs) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in forward_step inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs)) File "/disk3/miniconda3/envs/qp/lib/python3.7/site-packages/bigdl/nano/deps/ipex/ipex_inference_model.py", line 105, in inputs = tuple(map(lambda x: x.to(memory_format=torch.channels_last), inputs)) RuntimeError: required rank 4 tensor to use channels_last format ----------jit_bf16_ipex_channels_last failed to forward---------- ----------Start test openvino_fp32 model (8/11)---------- [ SUCCESS ] Generated IR version 11 model. [ SUCCESS ] XML file: /tmp/tmpsj7qrq7a/tmp.xml [ SUCCESS ] BIN file: /tmp/tmpsj7qrq7a/tmp.bin [ SUCCESS ] Total execution time: 0.68 seconds. [ SUCCESS ] Memory consumed: 83 MB. ----------Finish test openvino_fp32 model (8/11)---------- ----------Start test openvino_int8 model (9/11)---------- [ SUCCESS ] Generated IR version 11 model. [ SUCCESS ] XML file: /tmp/tmpw9j21xnv/tmp.xml [ SUCCESS ] BIN file: /tmp/tmpw9j21xnv/tmp.bin [ SUCCESS ] Total execution time: 0.66 seconds. [ SUCCESS ] Memory consumed: 83 MB. ----------Finish test openvino_int8 model (9/11)---------- ----------Start test onnxruntime_fp32 model (10/11)---------- ----------Finish test onnxruntime_fp32 model (10/11)---------- ----------Start test onnxruntime_int8_qlinear model (11/11)---------- ----------Finish test onnxruntime_int8_qlinear model (11/11)----------

==========================Optimization Results==========================


method status latency(ms) accuracy
original successful 0.786 0.021
bf16 fail to forward None None
int8 successful 1.394 0.021
jit_fp32_ipex early stopped 26.987 None
jit_fp32_ipex_channels_last fail to forward None None
jit_bf16_ipex successful 0.407 0.021
jit_bf16_ipex_channels_last fail to forward None None
openvino_fp32 successful 0.185 not recomputed
openvino_int8 successful 0.184 0.272
onnxruntime_fp32 successful 0.076 not recomputed
onnxruntime_int8_qlinear successful 0.109 0.022

Optimization cost 21.8s in total. ===========================Stop Optimization===========================

The code is as follows:

from bigdl.chronos.data.repo_dataset import get_public_dataset
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import numpy as np

tsdata_train, tsdata_val, _ = get_public_dataset(name='nyc_taxi')

stand = StandardScaler()
for tsdata in [tsdata_train, tsdata_val]:
    tsdata.impute()\
          .scale(stand, fit=tsdata is tsdata_train)\
          .roll(lookback=48,horizon=1)

train_data = tsdata_train
val_data = tsdata_val

from bigdl.chronos.forecaster.tcn_forecaster import TCNForecaster

forecaster = TCNForecaster(past_seq_len=48,
                           future_seq_len=1,
                           input_feature_num=1,
                           output_feature_num=1,
                           lr=0.001)
print(forecaster.num_processes)
forecaster.num_processes = 1
forecaster.fit(train_data, epochs=3, batch_size=32)
forecaster.optimize(train_data, val_data, thread_num=1)

# outputs = forecaster.predict(tsdata_val)
# gt = tsdata_val.to_numpy()[1]
# print(np.sum((gt - outputs)**2)/len(gt))

# pred_unscale = tsdata_val.unscale_numpy(gt)
# groundtruth_unscale = tsdata_val.unscale_numpy(outputs)

# plt.figure(figsize=(24,6))
# plt.plot(pred_unscale[:,:,0])
# plt.plot(groundtruth_unscale[:,:,0])
# plt.legend(["prediction", "ground truth"])
# plt.savefig(f'/disk3/qp/tcn_multiprocessing/img')
TheaperDeng commented 1 year ago

only jit_fp32_ipex seems to be a problem, will have a look