sintel-dev / Orion

Library for detecting anomalies in signals
https://sintel.dev/Orion/
MIT License
1.05k stars 162 forks source link

Discrepancy in Reproduced F1 Scores Compared to Published Results #539

Closed jed-ho closed 1 month ago

jed-ho commented 6 months ago

Description

I am attempting to reproduce the results of the research paper AER: Auto-Encoder with Regression for Time Series Anomaly Detection. I ran benchmark.py and obtained the results. However, the results show a significant discrepancy when compared to the F1 scores reported in the research paper. Could you please help investigate this discrepancy? Any guidance on whether I might be missing a step or misinterpreting the results would be greatly appreciated.

What I Did

  1. Run benchmark.py and obtain the results for each signal.
  2. Compare these results with those in Orion/benchmark/results/0.6.0.csv; the values in my results do match those in the file.
  3. Calculate the average F1 scores of signals from my results.
  4. Compare the average F1 scores with leaderboard.xlsx. There are minor differences between my results and the leaderboard.xlsx.
  5. Compare both sets of results with the F1 scores published in the paper; they exhibit a significant discrepancy.
sarahmish commented 5 months ago

Hi @jed-ho - thank you for using Orion!

After running benchmark.py you can use get_f1_scores in results.py to get the overview f1 scores and write_results to obtain the leaderboard.xlsx.

In Orion, we publish the benchmark with every release to help navigate the changes the happen due to external factors such as dependency changes and package updates. Your results should be consistent with the latest published benchmark results.

Hope this answers your questions!