jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

Add support for post-processing of model output #37

Closed latenighthacks closed 6 years ago

latenighthacks commented 6 years ago

Hi Villu, thank you for your excellent work on JPMML.

I have a multiple linear regression model in R that uses two features and predicts a response variable. Due to a project constraint, I must actually train the model to predict log10(response), so that the regression minimizes the RMSLE. However, the model should still output a prediction for the untransformed response.

My understanding is that JPMML-Evaluator can handle such post-processing of model outputs, but I have not been able to find any way to include post-processing instructions when I use r2pmml.

Would it be possible to include some way of post-processing outputs to apply a simple transformation? Or, if that is not feasible right now, would you recommend simply editing the XML by hand to add the transformation?

Thanks in advance for your time!

vruusmann commented 6 years ago

Related to: https://github.com/jpmml/jpmml-r/issues/7

I must train the model to predict log10(response). However, the model should still output a prediction for the untransformed response.

The (multiple-) linear regression model is encoded using the RegressionModel element. The simplest way to "undo" the log transform on the target variable would be to specify RegressionModel@normalizationMethod="exp" (by default, it should be none). See this page (and scroll down to the "Valid combinations" section): http://dmg.org/pmml/v4-3/Regression.html#xsdType_REGRESSIONNORMALIZATIONMETHOD

latenighthacks commented 6 years ago

Hi Villu,

Thanks for your quick reply! I edited the XML for the model in R using

r2pmml(m, "model.pmml")
modelXML <- xmlTreeParse("model.pmml", useInternalNodes = TRUE)
add_normalization <- function(xmlNode) { xmlAttrs(xmlNode) <- c(normalizationMethod = "exp") }
xpathApply(modelXML, 
                 "/jpmml:PMML/jpmml:RegressionModel", 
                 add_normalization, 
                 namespaces = c(jpmml = xmlNamespaceDefinitions(modelXML, simplify = T)))
saveXML(modelXML, "model.pmml" indent = TRUE)

and when we run the model in JPMML-Evaluator, the model output is transformed (as expected).

Thanks very much for your help. I will close the issue.