jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

PCA transformation included in the PMML provided by r2pmml #6

Open dalpozz opened 8 years ago

dalpozz commented 8 years ago

I'd like to apply PCA before training a classifier and include both PCA transformation and the classifier into the PMML using the r2pmml package. There is already a R package called pmmlTransformations that does a similar job, but I see that this is already possible in the Python version "sklearn2pmml" so I was wondering if this feature will be available in the future for r2pmml.

vruusmann commented 8 years ago

Sure, if your model has specific data pre-processing needs, then it would be desirable to have a way of including those into the PMML document.

The main problem is that R lacks proper abstractions in this area. So, every transformation has to be specified and implemented separately. In Python/Scikit-Learn you have everything collected nicely together into the sklearn.preprocessing package.

Can you provide example R code about using the PCA transformation in your workflow? The obvious candidate would be the preProcess function of the caret package.

dalpozz commented 8 years ago

Here is the R code

#load some toy data
library(unbalanced)
data("ubIonosphere")

#train with caret after applying PCA
library(caret)
fit <- caret::train(Class~ ., data=ubIonosphere, preProcess="pca", method = "rf", ntree = 200)

#save the model as PMML (including pre-processing method)
library(r2pmml)
r2pmml(fit, "fit.pmml")
vruusmann commented 8 years ago

The r2pmml function takes an optional preProcess argument now.

Data pre-processing can also be done in standalone mode, it doesn't need to be coupled to the train function. For example:

library("caret")

data(iris)
iris.preProcess = preProcess(iris, method = c("range"))

r2pmml(.., preProcess = iris.preProcess, ..)

The current implementation supports range, scale, center and medianImpute transformations. Other transformations (including pca) should become available shortly.