jpmml / r2pmml

R library for converting R models to PMML
GNU Affero General Public License v3.0
73 stars 18 forks source link

convert error: org.jpmml.rexp.RStringVector cannot be cast to org.jpmml.rexp.RIntegerVector #24

Closed pjpan closed 7 years ago

pjpan commented 7 years ago

Hi when I ready to convert a xgb model, it raise an error.

I recheck the "f_maps", it just has type "q"->float and "int"->int ,there is not string vector in the training data.

Why did the string RStringVector exists,so confuse?

vruusmann commented 7 years ago

The feature map is a data.frame, which has three columns:

  1. "id" - feature id/index - integer
  2. "name" - feature name - factor
  3. "type" - feature type - factor

The above error means that the second column of your feature map is string (not factor, as it should be).

If you run the Iris example from the README file, then you can see that the generated feature map corresponds to the above definition:

library("r2pmml")

data(iris)

iris_X = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1

# Generate XGBoost feature map
iris.fmap = genFMap(iris_X)

It needs to be investigated why the genMap() function behaves badly in your situation.

pjpan commented 7 years ago

thx, @vruusmann .you are right. Features' type did changed when using genFmap.

> str(f_maps)
'data.frame':   229 obs. of  3 variables:
 $ id  : int  0 1 2 3 4 5 6 7 8 9 ...
 $ name: chr  "hqck_flag" "dqck_flag" "tzck_flag" "bzjck_flag" ...
 $ type: chr  "int" "int" "int" "int" ...

I will figure it out by myself.

By the way,I have used genFmap from package r2pmml:

genFMap = function(df_X){
    col2name = function(x){
        col = df_X[[x]]
        if(is.factor(col)){
            return (lapply(levels(col), FUN = function(level){ paste(x, "=", level, sep = "") }))
        }
        return (x)
    }
    feature_names = lapply(names(df_X), FUN = col2name)

    col2type = function(x){
        switch(class(x), "factor" = rep("i", length(levels(x))), "numeric" = "q", "integer" = "int")
    }
    feature_types = lapply(df_X, FUN = col2type)

    fmap = data.frame("name" = unlist(feature_names), "type" = unlist(feature_types))
    fmap = cbind("id" = seq(from = 0, to = (nrow(fmap) - 1)), fmap)

    return (fmap)
}