The prediction performance of the last time step is significantly worse than other time steps.

zezhishao commented 2 years ago

Hi, Autoformer is an amazing work. I am working on it recently, and I came across an interesting (and confusing) observation.

I set the length of both input time and output time series to 336, and evaluate the performance of Autoformer.

Different from the original Autoformer, which calculates the overall metrics, I test the performance at each time steps.

However, I found that the performance of the last time step is significantly worse than other time steps. This phenomenon does not occur in other models, e.g., Informer.

I am not sure it is the nature of Autoformer or a "bug". I think this phenomenon might suggest that the performance of Autoformer can be improved further.

zezhishao commented 2 years ago

Here are the changes to the code.

Here is the result:

Args in experiment:
Namespace(is_training=1, model_id='test', model='Autoformer', data='ETTh1', root_path='./data/ETT/', data_path='ETTh1.csv', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=336, label_len=168, pred_len=336, bucket_size=4, n_hashes=4, enc_in=7, dec_in=7, c_out=7, d_model=512, n_heads=8, e_layers=2, d_layers=1, d_ff=2048, moving_avg=25, factor=1, distil=True, dropout=0.05, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.0001, des='test', loss='mse', lradj='type1', use_amp=False, use_gpu=True, gpu=0, use_multi_gpu=False, devices='0,1,2,3')
Use GPU: cuda:0
>>>>>>>start training : test_Autoformer_ETTh1_ftM_sl336_ll168_pl336_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_test_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 7969
val 2545
test 2545
        iters: 100, epoch: 1 | loss: 0.5396609
        speed: 0.1091s/iter; left time: 260.9113s
        iters: 200, epoch: 1 | loss: 0.4733845
        speed: 0.0967s/iter; left time: 221.4314s
Epoch: 1 cost time: 25.440151691436768
Epoch: 1, Steps: 249 | Train Loss: 0.5041785 Vali Loss: 1.3170781 Test Loss: 0.4933851
Validation loss decreased (inf --> 1.317078).  Saving model ...
Updating learning rate to 0.0001
        iters: 100, epoch: 2 | loss: 0.4186843
        speed: 0.3271s/iter; left time: 700.6119s
        iters: 200, epoch: 2 | loss: 0.3768386
        speed: 0.0976s/iter; left time: 199.3275s
Epoch: 2 cost time: 24.865762948989868
Epoch: 2, Steps: 249 | Train Loss: 0.4074220 Vali Loss: 1.4148206 Test Loss: 0.5779386
EarlyStopping counter: 1 out of 3
Updating learning rate to 5e-05
        iters: 100, epoch: 3 | loss: 0.2960369
        speed: 0.3391s/iter; left time: 641.9930s
        iters: 200, epoch: 3 | loss: 0.2784007
        speed: 0.0979s/iter; left time: 175.6230s
Epoch: 3 cost time: 25.066813468933105
Epoch: 3, Steps: 249 | Train Loss: 0.3106790 Vali Loss: 1.4348079 Test Loss: 0.6341387
EarlyStopping counter: 2 out of 3
Updating learning rate to 2.5e-05
        iters: 100, epoch: 4 | loss: 0.2686975
        speed: 0.3438s/iter; left time: 565.2685s
        iters: 200, epoch: 4 | loss: 0.2501120
        speed: 0.0978s/iter; left time: 150.9324s
Epoch: 4 cost time: 24.955880880355835
Epoch: 4, Steps: 249 | Train Loss: 0.2695364 Vali Loss: 1.4221642 Test Loss: 0.6396421
EarlyStopping counter: 3 out of 3
Early stopping
>>>>>>>testing : test_Autoformer_ETTh1_ftM_sl336_ll168_pl336_dm512_nh8_el2_dl1_df2048_fc1_ebtimeF_dtTrue_test_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 2545
test shape: (79, 32, 336, 7) (79, 32, 336, 7)
test shape: (2528, 336, 7) (2528, 336, 7)
**************************************************Evaluating the performance of each time step...**************************************************
Evaluate best model on test data for horizon 12, MAE: 0.4684
Evaluate best model on test data for horizon 24, MAE: 0.4695
Evaluate best model on test data for horizon 48, MAE: 0.4811
Evaluate best model on test data for horizon 96, MAE: 0.4928
Evaluate best model on test data for horizon 192, MAE: 0.4969
Evaluate best model on test data for horizon 288, MAE: 0.5060
Evaluate best model on test data for horizon 333, MAE: 0.5194
Evaluate best model on test data for horizon 334, MAE: 0.5238
Evaluate best model on test data for horizon 335, MAE: 0.5299
Evaluate best model on test data for horizon 336, MAE: 0.5556
mse:0.4933849275112152, mae:0.4951804280281067

zezhishao commented 2 years ago

I also calculated the metrics based on the re-scaled data, and the gap will be more significant.

zezhishao commented 2 years ago

It doesn't seem to be a reproducible phenomenon: it doesn't always happen when run the code multiple times.

thuml / Autoformer

The prediction performance of the last time step is significantly worse than other time steps. #94