linkedin / isolation-forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm with support for exporting in ONNX format.
Other
223 stars 47 forks source link

Multiple Rows as One Data Point #30

Closed TAsUjxnMIL closed 2 years ago

TAsUjxnMIL commented 2 years ago

Hello, I have a general question about the Isolation Forest algorithm. My dataframe looks like this:

Metric_1 Metric_2
Row_1
Row_2

Is it possible that Isolation Forest observes multiple rows as a data instance? That means if Isolation Forest identifies an anomaly, the anomaly shall refer to multiple rows, for example Row_1 and Row_2. Currently Isolation Forest gives me one row as anomaly but in my data set multiple rows need to be seen as collective and thus only multiple rows can be anomalous. Do you know if there is a solution for this with Isolation Forest or another algorithm?

jverbus commented 2 years ago

It sounds like you will need to do some aggregation (e.g., groupby, clustering) before or after you run the isolation forest algorithm. The details of how you choose to do this will be specific to your use case.

jverbus commented 2 years ago

I am closing this ticket, because this is not a feature request or issue with the isolation forest library.