h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.88k stars 1.99k forks source link

One-Class Classification to detect anomaly or not #16089

Open dmresearch15 opened 7 months ago

dmresearch15 commented 7 months ago

I may have opted for a provisional title due to the nuanced nature of the issue.

Specifically, I encountered a business quandary where I received one dataset with a specified set of observations, assured of its anomaly-free nature. This initial dataset comprises a total of n1 features.

Subsequently, I was presented with another dataset containing a subset of features (n2 < n1) also present in the first dataset.

Now, the challenge at hand is to ascertain whether some of the observations from the second dataset can be integrated into the first dataset or not, and to define a business rule for the remaining observations in the second dataset.

In pursuit of a solution, I am actively seeking a model tailored to address this precise problem.

wendycwong commented 7 months ago

If you are looking to detect anomaly, there are several algorithms that I would like to point you to:

  1. Use deeplearning autoencoder: https://github.com/h2oai/h2o-tutorials/blob/master/best-practices/anomaly-detection/anomaly_detection.ipynb
  2. Isolation forest/extended isolation forest: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/if.html, https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/eif.html