Open jimmylihui opened 1 year ago
From my understanding, some traditional pretraining techniques(like in BERT) will mask the input data. It seems fine to me.
patch_masking method of PatchMaskCB class works in the correct way as described in their paper.
In normal self-supervised learning, pretraining is usually achieved reconstruct the data with masking. However, in your implementation, pertaining is achieved by predicting the task with masked input. What is the reason of this design