yahoo / egads

A Java package to automatically detect anomalies in large scale time-series data
Other
1.16k stars 331 forks source link

expected series - how to produce #28

Closed chenbekor closed 8 years ago

chenbekor commented 8 years ago

it is not clear from the code how to use the engine. specifically - how to generate the expected series.

in the AnomalyDetectionModel there is the following method:

// detect anomalies.
public Anomaly.IntervalSequence detect(
        TimeSeries.DataSequence observedSeries,
        TimeSeries.DataSequence expectedSeries) throws Exception;

I'm not sure how the expectedSeries is produced?

in the unit tests it is loaded from a file: src/test/resources/modeloutput" + refWindows[w] + "_" + drops[d] + ".csv" which is also - not understood.

any help is appreciated.

nlaptev commented 8 years ago

The expected time-series is produced from the many time-series models available (src/main/java/com/yahoo/egads/models/tsmm/). Here is a sample test-case for one of the models: egads/src/test/java/com/yahoo/egads/TestAnomalyDetect.java

chenbekor commented 8 years ago

Indeed I reviewed this file but I'm not sure I understand the flow..

first the test case loads a actual_metric time series (line 40)

then, there is a loop with an inner loop at which another time series is loaded form disk:

src/test/resources/modeloutput" + refWindows[w] + "_" + drops[d] + ".csv"

then, the test case train 3 detection models using the actual vs the expected but both serieses are loaded from disk .... so I can't figure out how to learn from this.

in runtime - I only have the actual time series . how do I produce the expected time series?

seems like there's missing documentation. thanks for helping!

nlaptev commented 8 years ago

We are populating the expected time-series using the predict() method. Specifically, in the file I referenced previously we have:

OlympicModel model = new OlympicModel(p); model.train(actual_metric.get(0).data); TimeSeries.DataSequence sequence = new TimeSeries.DataSequence(metrics.get(0).startTime(), metrics.get(0).lastTime(), 3600); sequence.setLogicalIndices(metrics.get(0).startTime(), 3600); model.predict(sequence);

The model.predict(sequence) call is the one that uses a forecasting model to populate the expected time-series sequence.

ssinhaonline commented 7 years ago

@nlaptev Suppose I want to pass in data of 30 days as my training time series, and then use the model to detect anomaly points for 1 day as my test time series. What should I do?

Should we pass in testDs like so: ArrayList<Anomaly> anomalyList = ad.detect(ad.metric, forecastDs);

Which would mean we would need to change the function signature from: ArrayList<TimeSeries.DataSequence> list = ma.forecast( ma.metric.startTime(), ma.metric.lastTime()); to: ArrayList<TimeSeries.DataSequence> list = ma.forecast(forecastDs.startTime(), forecastDs.lastTime());

What other changes need to me done? Or is there an easier way of doing what I am trying to achieve?

Thanks for helping.