H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Inconsistency in Isolation Forest performance reporting in Python and R. R does not return any scores. Am I doing something wrong?
Python {code} from h2o.estimators.isolation_forest import H2OIsolationForestEstimator train = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv") test = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_test.csv") isofor_model = H2OIsolationForestEstimator(sample_size = 5, ntrees = 7, seed = 12345) isofor_model.train(training_frame = train) perf = isofor_model.model_performance() perf
ModelMetricsAnomaly: isolationforest Reported on train data.
Anomaly Score: 1.7285714285714286 Normalized Anomaly Score: 0.6555555555555554 {code}
R {code} train <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_train.csv") test <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/anomaly/ecg_discord_test.csv") isofor_model <- h2o.isolationForest(training_frame = train, sample_size = 5, ntrees = 7, seed = 12345) perf <- h2o.performance(isofor_model) H2OAnomalyDetectionMetrics: isolationforest Reported on training data. Metrics reported on Out-Of-Bag training samples {code}