Closed kdutkowski closed 2 years ago
I think it should be mtcars.fmap = as.fmap(mtcars.matrix) instead of mtcars.fmap = as.fmap(mtcars.frame),
This is quite an old code example. I believe that old(er) XGBoost versions did not perform this sanity check, and were happy to accept an extra column.
This extra column represents the label. It's the last one (ie. on the rightmost position), so it does not distort the indices/interpretations of the earlier feature columns. If this extra column was on the first position, then the exported model would be referencing wrong features, and would be making incorrect predictions.
it works alright when I change it. Am I right?
If the conversion process did not raise any errors, and the PMML model makes correct predictions when invoked with sample data, then it most definitely is correct. My approval is not needed.
There are more R code examples here (more complex stuff like categorical features, missing values, etc): https://github.com/jpmml/jpmml-xgboost/blob/1.6.0/pmml-xgboost-testing/src/test/resources/xgboost.R
Some more examples are available in the JPMML-R project: https://github.com/jpmml/jpmml-r/blob/1.4.5/src/test/R/xgboost.R
One thing you could try is embedding model verification dataset into the PMML document: https://github.com/jpmml/jpmml-r/blob/1.4.5/src/test/R/xgboost.R#L90
BTW: in future XGBoost version (1.5.X and up), it should be possible to get rid of the "feature map" functionality, because the XGBoost model file will contain basic information about the training dataset - feature names, category levels for categorical features, etc.
Will close this issue with an updated code example someday later.
Ok, that's perfect. My only goal was to give you a heads up about the issue I stumbled on, thanks for your quick response!
I tried running the mtcars training sample script from your example in README and it gives me the following error:
I think it should be
mtcars.fmap = as.fmap(mtcars.matrix)
instead ofmtcars.fmap = as.fmap(mtcars.frame)
, it works alright when I change it. Am I right?