yuqinie98 / PatchTST

An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730
Apache License 2.0
1.37k stars 248 forks source link

Unsatisfactory results on a regression task : ( need suggestions. #78

Closed Valhir924 closed 8 months ago

Valhir924 commented 8 months ago

Thanks for creating such a great model : )

I made an effort to transform it into a regression version and encountered some issues.

I add an additional flatten layer and MLPs to the original outputs of PatchTST forecasting model in order to fit it in a regression task, and the denorm part of revin layer has been canceled since the my input data and output results are not on a same magnitude order.

However, as epoch increases, the training results of model come out as an almost constant value. That's ... frustrating.

The parameters of PatchTST part are listed below.

seq_len=512, label_len=0, pred_len=16, fc_dropout=0.3, head_dropout=0.0, patch_len=32, stride=16, train_epochs=50, dropout=0.05, enc_in=13, e_layers=3, d_model=256, n_heads=16, d_ff=512, individual=1, subtract_last=False, revin=True, decomposition=False, affine=False, activation='gelu', batch_size=128, patience=100, learning_rate=0.01, loss='mse', lradj='TST', pct_start=0.3, use_amp=False

I've studied this issue for couple of days, and I am confused of its reason. Is the data not good enough to extract features or do I set the wrong parameters? Or my edition to original PatchTST is incorrect.

I'm totally confused. Here stands a noob waiting for help...

namctin commented 8 months ago

Hi Valhir924, we have not tested the regression task thoroughly. What dataset you use for this task? We will check and get back to you.

Valhir924 commented 8 months ago

Thanks for reply! @namctin

: )

I'm sorry, for some privacy reasons I'm not sure whether I can share you with it, since it contains real-life data sampled from volunteers for research purpose.

Recently I've tried models with convolution neuron networks. Although the vali loss still comes out high, but the train loss decreases to an ideal one. However, when I tried with PatchTST, I got significant train loss like what I mentioned above, "the training results of model come out as an almost constant value". That's why I doubt whether the parameters are set wrong.

btw when I set decomposition = True, errors are reported like this:

Traceback (most recent call last): File "/home/user/Disk/codes/My_run_longEXP.py", line 85, in <module> exp.train(setting) File "/home/user/Disk/codes/exp/My_exp.py", line 192, in train outputs = self.model(batch_x) File "/home/user/anaconda3/envs/new_processor/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/user/Disk/codes/models/edited_TST.py", line 61, in forward x_dynamic = self.patchTST(x_dynamic) File "/home/user/anaconda3/envs/new_processor/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/user/Disk/codes/models/PatchTST.py", line 82, in forward res_init, trend_init = self.decomp_module(x) File "/home/user/anaconda3/envs/new_processor/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/user/Disk/codes/layers/PatchTST_layers.py", line 54, in forward res = x - moving_mean RuntimeError: The size of tensor a (512) must match the size of tensor b (511) at non-singleton dimension 1

Do u know the reasons of it?