yuqinie98 / PatchTST

An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730
Apache License 2.0
1.51k stars 262 forks source link

After PatchTST encoder, why do permute in last two dims? #31

Closed MasterKID223 closed 1 year ago

MasterKID223 commented 1 year ago

Hello, you reshape the u (bs*nvars, patch_num, d_model) before encoder,

https://github.com/yuqinie98/PatchTST/blob/b4c9f6fa7eaa5d86277d2da78026f06702cd85ad/PatchTST_supervised/layers/PatchTST_backbone.py#L164

why do permute to transform z (bs*nvars, d_model, patch_num)?

https://github.com/yuqinie98/PatchTST/blob/b4c9f6fa7eaa5d86277d2da78026f06702cd85ad/PatchTST_supervised/layers/PatchTST_backbone.py#L168-L170

In next step, z (bs*nvars, d_model, patch_num) is fed into head module, then z pass a flatten layer. Can I flatten z in the way of z(-1, patch_num, d_model) instead of (-1, d_model, patch_num) ?

https://github.com/yuqinie98/PatchTST/blob/b4c9f6fa7eaa5d86277d2da78026f06702cd85ad/PatchTST_supervised/layers/PatchTST_backbone.py#L74-L75

https://github.com/yuqinie98/PatchTST/blob/b4c9f6fa7eaa5d86277d2da78026f06702cd85ad/PatchTST_supervised/layers/PatchTST_backbone.py#L56-L57

https://github.com/yuqinie98/PatchTST/blob/b4c9f6fa7eaa5d86277d2da78026f06702cd85ad/PatchTST_supervised/layers/PatchTST_backbone.py#L120-L122

yuqinie98 commented 1 year ago

Hi @MasterKID223, thanks for asking! This is to just keep the corresponding dimensions for input and output to be the same, so that you can add more modules or using different heads rather than a flatten one (e.g. a Transformer-based decoder)