[BUG] detect fake anomalies when the buckets are not completely full

MCdeamon7 commented 7 months ago

I have generated fake data with a gauss function, the data are generated from ~8AM to ~6PM for 30 days with the related timestamp. When the data starts or ends, I frequently get spikes that trigger the anomaly detection algorithm. anomalyDetectionSpike

Observing the data in "visualize" that depending on the bucket size the algorithm (avg) actually show random spikes: spike

But zooming, the data are actually as expected. zoom-visualize

To reproduce the bug, you can try to import data generated with my python script: https://github.com/MCdeamon7/data-generator import the data and then try to make an anomaly detection job on it

I imagine that is something about bucket size because by adjusting it from 5 to 10 or to 15 the spikes disappear in some places and reappear in other places.

I'm using it on a docker container on Linux recompiled by me from version 2.11.0 (i modified just the max features from 5 to 10)

Sorry for my English and thanks to everyone that can help me.

minalsha commented 5 months ago

@kaituo thoughts?

kaituo commented 4 months ago

@MCdeamon7 Can you list steps on how to generate data using https://github.com/MCdeamon7/data-generator ?

Also, what do you mean by bucket size/ What's your detector configuration?

opensearch-project / anomaly-detection

[BUG] detect fake anomalies when the buckets are not completely full #1108