yuqinie98 / PatchTST

An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730
Apache License 2.0
1.37k stars 248 forks source link

mix channel implementation #64

Closed EllaHxyz closed 11 months ago

EllaHxyz commented 1 year ago

Hi, May I ask how do you implement the channel-mixing model for the comparison (figure 7)? In the paper, it mentioned to reshape (B,M,P,N) to (B,MP,N), but it's not very clear what comes next? Did you feed the (B, MP,N) to projection/embedding to produce (B, D, N)? How did you shape it back to (B,M,D,N) after the transformer encoder? If the code is present in the repository and I overlook, please kindly advise me where to find it. Thank you!

yuqinie98 commented 12 months ago

Hi @EllaHxyz , yes you are right, we just treat M·P as the feature channel, so there are M·P features now. Then this is directly use to predict the output (B, M, T) via transformer and linear head.

EllaHxyz commented 12 months ago

With an output of [B,D,N] from the transformer encoder, how did you reshape it to [B,M,T]? Could you give more details on the implementation after the transformer? Thanks!

yuqinie98 commented 11 months ago

Hi, You can flatten the last 2 dimensions and use a linear layer to do it.