Minimax strategy and early_stopping

thuml / Anomaly-Transformer

About Code release for "Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight), https://openreview.net/forum?id=LzQQ89U1qm_

MIT License

754 stars 196 forks source link

Come uninvited : ). The following is my own understanding, for reference only, correctness not guaranteed： *Please note that the order in the code is Max-Min, not Min-Max as in the paper.

The reason why Min-Max cannot be used is that the random initialization of series association (SA) has no meaning and may be very far away from the target point, which makes it difficult to continue training. In extreme cases, the model can no longer use more distant context for modeling.

The initialization of prior association (PA) is affected by the unimodal statistical properties of its Gaussian kernel function, so there is a natural neighborhood meaning and it can be directly used as the Max-training label.

That's why we use Max-Min rather Min-Max. Hope for helping.

thuml / Anomaly-Transformer

Minimax strategy and early_stopping #45