jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

The result of convert RandomForest is too big #42

Closed Yao544303 closed 6 years ago

Yao544303 commented 6 years ago

I trained a Random Froest in R with 288 features

set.seed(1234)
fit <- randomForest(V290 ~ ., data = data_train[,-1], mtry = 13, ntree=700)
r2pmml(fit, "fit.pmml")

The size of fit.pmml is 1.04G, is it too big ?

vruusmann commented 6 years ago

The size of fit.pmml is 1.04G, is it too big ?

The size of the PMML file is proportional to the size of the underlying R model object. You're working with a fairly big dataset, and you're ensembling 700 decision trees - it's no surprise that the size of the PMML file approaches 1 GB.

Anyway, I've just implemented the compaction of randomForest objects, which can be activated by specifying the compact = TRUE argument:

library("randomForest")
library("r2pmml")

iris.rf = randomForest(Species ~ ., data = iris, ntree = 7)

r2pmml(iris.rf, "iris.pmml")
r2pmml(iris.rf, "iris-compact.pmml", compact = TRUE)

In my computer, these two PMML files compare as follows: