szilard / benchm-ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
MIT License
1.87k stars 335 forks source link

New bench-ml 4-h2o.R for H2O cluster version: 3.8.3.3 #42

Closed tobigithub closed 7 years ago

tobigithub commented 7 years ago

Hi, this is the corrected R code for for H2O cluster version: 3.8.3.3 and R 3.3.1 The old code would not run under these versions. The final AUC with sample_rate = 1.0 for 1 million records is 0.77 which tops the old results.

For 10M the AUC is 0.7922 for a quad core CPU@4Ghz in 1676.02 seconds (more accurate and also 2x faster than the current report with a 32 thread machine and using only two GByte of RAM).

This needs code validation.


# works for H2O Flow 3.8.3.3 and R 3.3.1 (July 2016)
# load H2O Flow at http://localhost:54321/flow/index.html
library(h2o)
# because H2o is limited to two cores by default we need to assign all  cores/threads
h2o.init(nthreads = -1)

# load data from current directory
dx_train <- h2o.importFile(path = "train-1m.csv")
dx_test <- h2o.importFile( path = "test.csv")

# assign variables
Xnames <- names(dx_train)[which(names(dx_train)!="dep_delayed_15min")]

# start training H2O random forest 
system.time({
    md <- h2o.randomForest(x = Xnames, y = "dep_delayed_15min", training_frame= dx_train, sample_rate = 0.632, ntrees = 100, max_depth = 20)
    })

# prediction
phat <- h2o.predict(md, dx_test)

# extract  accuracy and compare against test set
phat$Accuracy <- phat$predict == dx_test$dep_delayed_15min
# display Accuracy (0.70)
mean(phat$Accuracy)

# display AUC (0.73)
system.time({
  print(h2o.performance(md, dx_test)@metrics$AUC)
})
szilard commented 7 years ago

mean(phat$Accuracy) does not give you AUC https://www.kaggle.com/wiki/AreaUnderCurve

tobigithub commented 7 years ago

Hi, I see, thank you : "give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime" I corrected the code above. Tobias

szilard commented 7 years ago

OK, cool.