yuqinie98 / PatchTST

An offical implementation of PatchTST: "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers." (ICLR 2023) https://arxiv.org/abs/2211.14730
Apache License 2.0
1.51k stars 262 forks source link

Question about self-supervised learning #75

Open jimmylihui opened 1 year ago

jimmylihui commented 1 year ago

In normal self-supervised learning, pretraining is usually achieved reconstruct the data with masking. However, in your implementation, pertaining is achieved by predicting the task with masked input. What is the reason of this design

vincent05r commented 11 months ago

From my understanding, some traditional pretraining techniques(like in BERT) will mask the input data. It seems fine to me.

DSonDH commented 3 months ago

patch_masking method of PatchMaskCB class works in the correct way as described in their paper.