numenta / NAB

The Numenta Anomaly Benchmark
GNU Affero General Public License v3.0
1.93k stars 868 forks source link

negative (and pretty big) numbers in final results #324

Closed evgenij1987 closed 4 years ago

evgenij1987 commented 6 years ago

Hi, a have the following problem. I am trying to evaluate EWMA algorithm using NAB and get the following results: "EWMA": { "reward_low_FN_rate": -239.29681425287356, "reward_low_FP_rate": -897.695072875, "standard": -404.6348765517242 }

Do you have any hint, why I get these negative (and pretty big) numbers?

Thank you in advance. EWMA.zip

scottpurdy commented 6 years ago

@evgenij1987 - Thanks for reaching out.

First, just to check. Have you run all steps in the right order? Something like this:

python run.py -d YOUR_DETECTOR_NAME --detect --optimize --score --normalize

If the problem is still there then we can check the optimizer. The optimize step should ensure that in the worst case, no detections are made and you get a score similar to null detector (0.0). If you are running all steps and still get scores below 0.0 then you can check if the optimizer is failing to set the threshold correctly (there have been some reports that it fails in some cases) by (A) running just --detect, (B) manually setting the threshold, and then (C) running --score --normalize. When manually setting the threshold you want to set it to a value below the smallest anomaly score output by your detector so that no detections are made. This should match the null detector.

If manually setting the threshold results in a final score of 0.0 then the issue then it may be a bug in the optimizer. Let me know what you find!

evgenij1987 commented 6 years ago

Hi! Thank you for your response. I was running actually only: python run.py -d EWMA --optimize --score --normalize since I used a Java implementation of EWMA and thus just provided the anomaly scores in results folder. Every file with anomaly scores I created with the following columns: timestamp,value,anomaly_score,label,S(t)_reward_low_FP_rate,S(t)_reward_low_FN_rate,S(t)_standard The anomaly_score I put there is just 1 or 0. For the label I put 1 if the respective timestamp is contained in combined_labels.json. The entries for S(t)_reward_low_FP_rate,S(t)_reward_low_FN_rate,S(t)_standard columns are just filled with 0.

Unfortunately manually setting the threshold (e.g. to 0.5, 0.9, 1.0 ) and running python run.py -d EWMA --score --normalize didn't change the results. I hope you can still help me.

scottpurdy commented 6 years ago

What about setting the threshold to 1.1 just to make sure that there are no detections? The results should match the null detector. If not then can you share the file with your detections?

evgenij1987 commented 6 years ago

So I tried 1.1 now. The results match the null detector as you said. But what does it tell me now? Obviously setting threshold to 1.1 means that all anomaly scores from the EWMA detector are just ignored in regard of the final score. So does it mean there is no way to score the algorithm better than the null detector? I mean it should be at least as good as the random detector, but correct me if I am wrong.

scottpurdy commented 6 years ago

Ok so the optimizer is erroneously setting the threshold to <=1. But now the question is why doesn't lowering the threshold to allow some detections improve the results over the null detector? Because there are many false positives. Normally, algorithms have varying confidence for different detections so the optimizer can find a threshold that includes a few high confidence anomalies but excuses the others when false positives start to add up.

Because you set all values to either 0 or 1 anomaly score, you end up with all anomalies counted as such whenever the threshold drops below 1. Since this results in poor scores, a good portion of then must be false positives.

My recommendation is to try to use weighted anomaly scores so hopefully the optimizer can find a threshold that includes enough true positives without too many false positives.