jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

RandomForest to PMML #47

Closed Clls1 closed 5 years ago

Clls1 commented 5 years ago

Hello,

I have run a randomForest however I am not able to extract the pmml. What could be the issue? Thank you so much

rf <- randomForest(as.factor(fraude) ~ Total_Monto_A+Total_Saldo_A + as.factor(Tem_Modelo_Equipo) + Total_FaturasZero_R, 
                            data = segmentacao_out_model, ntree = 5,
                            nodesize = 5, importance = TRUE)

Error:

> r2pmml(rf, "rf.pmml")
out 08, 2018 10:40:12 AM org.jpmml.rexp.Main run
INFO: Parsing RDS..
out 08, 2018 10:40:12 AM org.jpmml.rexp.Main run
INFO: Parsed RDS in 36 ms.
out 08, 2018 10:40:12 AM org.jpmml.rexp.Main run
INFO: Initializing default Converter
out 08, 2018 10:40:12 AM org.jpmml.rexp.Main run
INFO: Initialized org.jpmml.rexp.RandomForestConverter
out 08, 2018 10:40:12 AM org.jpmml.rexp.Main run
INFO: Converting..
out 08, 2018 10:40:12 AM org.jpmml.rexp.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: other
    at org.jpmml.rexp.RExpUtil.getDataType(RExpUtil.java:46)
    at org.jpmml.rexp.FormulaUtil.createFormula(FormulaUtil.java:71)
    at org.jpmml.rexp.RandomForestConverter.encodeFormula(RandomForestConverter.java:121)
    at org.jpmml.rexp.RandomForestConverter.encodeSchema(RandomForestConverter.java:70)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:69)
    at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)

Exception in thread "main" java.lang.IllegalArgumentException: other
    at org.jpmml.rexp.RExpUtil.getDataType(RExpUtil.java:46)
    at org.jpmml.rexp.FormulaUtil.createFormula(FormulaUtil.java:71)
    at org.jpmml.rexp.RandomForestConverter.encodeFormula(RandomForestConverter.java:121)
    at org.jpmml.rexp.RandomForestConverter.encodeSchema(RandomForestConverter.java:70)
    at org.jpmml.rexp.ModelConverter.encodePMML(ModelConverter.java:69)
    at org.jpmml.rexp.Converter.encodePMML(Converter.java:39)
    at org.jpmml.rexp.Main.run(Main.java:149)
    at org.jpmml.rexp.Main.main(Main.java:97)
Error in .convert(tempfile, file, converter, converter_classpath, verbose) : 
  1
vruusmann commented 5 years ago

java.lang.IllegalArgumentException: other at org.jpmml.rexp.RExpUtil.getDataType(RExpUtil.java:46)

It means that the JPMML-R library is unable to figure out the data type (what is the PMML equivalent of R's other data type) of one or more columns.

I believe that it's related to the fact that you're performing "cast to factor" operations inside the R formula:

rf <- randomForest(as.factor(fraude) ~ Total_Saldo_A + as.factor(Tem_Modelo_Equipo), data = segmentacao_out_model)

Does the conversion succeed, if you perform those cast operations before the randomForest() function call? For example:

segmentacao_out_model$fraude = as.factor(segmentacao_out_model$fraude)
segmentacao_out_model$Tem_Modelo_Equipo = as.factor(segmentacao_out_model$Tem_Modelo_Equipo)

rf <- randomForest(fraude ~ Total_Saldo_A + Tem_Modelo_Equipo, data = segmentacao_out_model)
Clls1 commented 5 years ago

Thanks to your tip I discovered the problem. The thing was that one of the variables was of the type difftime, I converted to numeric and it worked! Thank you so much! Please continue the great work!

vruusmann commented 5 years ago

The thing was that one of the variables was of the type difftime

Can you provide a reproducible example about using difftime?

The PMML standard provides first-class date/time data types, and is able to do arithmetic with them (eg. calculating the number of days between two dates, the number of seconds between two timestamps etc.). Would be very interested in prototyping something in this area.

vruusmann commented 5 years ago

Related issues: https://github.com/jpmml/jpmml-r/issues/8 https://github.com/jpmml/jpmml-r/issues/9