thuml / Anomaly-Transformer

About Code release for "Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy" (ICLR 2022 Spotlight), https://openreview.net/forum?id=LzQQ89U1qm_
MIT License
733 stars 190 forks source link

Incorrect Test Set Evaluation in Line 291: F1 Score Discrepancy between thre_loader and test_loader #62

Open mojtaba-nafez opened 10 months ago

mojtaba-nafez commented 10 months ago

Hi there,

Excellent job on this! However, I've identified a potential issue in your code related to testing. I'm currently working with the MSL dataset and, upon reviewing your code—specifically at line 291 (following the comment: # (3) evaluation on the test set)—I noticed that the model is being evaluated on thre_loader instead of test_loader. Since thre_loader only contains 1% of the test data, the reported F1 score in the paper is 93.59%. However, upon correction, by using test_loader instead of thre_loader, the final F1 score dropped to 86.49%.

I will be looking forward to hearing your thoughts on this potential bug.

elwoodwgd commented 9 months ago

I also have this question. Why use thre_loader? 1702550496871

BITGJW commented 9 months ago

Same question.

lzz19980125 commented 7 months ago

Same question.

DarkFT commented 1 month ago

Same question. I think test_loader should be used to find the threshold. It's unfair to find threshold on these: train energy (5821800,) (train_loader) test energy (73700,) (thre_loader)