Open szilard opened 5 years ago
r4.8xlarge
10M: data RAM 10G 20sec 0.7093 total RAM 22G
100M: data RAM 60G 155sec 0.7093 total RAM 110G
compare to h2o:
library(h2o)
h2o.init()
dx_train <- h2o.importFile("train-10m.csv")
dx_test <- h2o.importFile("test.csv")
Xnames <- names(dx_train)[which(names(dx_train)!="dep_delayed_15min")]
system.time({
md <- h2o.glm(x = Xnames, y = "dep_delayed_15min", training_frame = dx_train, family = "binomial")
})
h2o.auc(h2o.performance(md, dx_test))
10M: data RAM 4G 6sec 0.7081992 total RAM 6G
100M:
dx_train0 <- h2o.importFile("train-10m.csv")
dx_train <- h2o.rbind(dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0, dx_train0)
data RAM 6G 36 sec 0.7081992 total RAM 11G
10M | 100M | |||
---|---|---|---|---|
Spark | h2o | Spark | h2o | |
time [s] | 20 | 6 | 155 | 36 |
AUC | 0.709 | 0.708 | 0.709 | 0.708 |
data RAM [GB] | 10 | 4 | 60 | 6 |
data+train RAM [GB] | 22 | 6 | 110 | 11 |