Closed keithgw closed 8 years ago
The converter assumes that class labels are of string
datatype.
As a temporary workaround, can you make the example code work if you convert the target column from boolean
datatype to string
datatype?
Something like this:
iris_df = iris_df["Species"].astype(str)
Same error:
iris_y = iris[:, 4].astype(str)
iris_y.dtype # dtype('S1')
CalledProcessError Traceback (most recent call last)
<ipython-input-203-1868a03b599c> in <module>()
----> 1 sklearn2pmml(estimator = iris_clf, mapper = iris_mapper, pmml = "code_output/irisXGB.pmml", with_repr = True)
/Users/kwilliams/Library/Python/2.7/lib/python/site-packages/sklearn2pmml/__init__.pyc in sklearn2pmml(estimator, mapper, pmml, with_repr, debug)
63 if(debug):
64 print(" ".join(cmd))
---> 65 subprocess.check_call(cmd)
66 finally:
67 if(debug):
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.pyc in check_call(*popenargs, **kwargs)
538 if cmd is None:
539 cmd = popenargs[0]
--> 540 raise CalledProcessError(retcode, cmd)
541 return 0
542
SEVERE: Failed to convert Estimator
java.lang.ClassCastException: numpy.core.NDArray cannot be cast to java.util.List
at xgboost.sklearn.XGBClassifier.getClasses(XGBClassifier.java:55)
at sklearn.Classifier.createSchema(Classifier.java:43)
at sklearn.EstimatorUtil.encodePMML(EstimatorUtil.java:47)
at org.jpmml.sklearn.Main.run(Main.java:189)
at org.jpmml.sklearn.Main.main(Main.java:107)
Exception in thread "main" java.lang.ClassCastException: numpy.core.NDArray cannot be cast to java.util.List
at xgboost.sklearn.XGBClassifier.getClasses(XGBClassifier.java:55)
at sklearn.Classifier.createSchema(Classifier.java:43)
at sklearn.EstimatorUtil.encodePMML(EstimatorUtil.java:47)
at org.jpmml.sklearn.Main.run(Main.java:189)
at org.jpmml.sklearn.Main.main(Main.java:107)
Also tried iris_df["Species"] = iris_df["Species"].astype(str)
Your example script is missing the definition of iris_mapper
. So, I used the following one:
iris_mapper = DataFrameMapper([
(["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()]),
("Species", None)
])
After that, everything works fine in my computer (printed using sklearn2pmml(debug = True)
):
python 2.7.11
sklearn 0.17.1
sklearn.externals.joblib 0.9.4
sklearn_pandas 1.1.0
sklearn2pmml 0.9.7
xgboost 0.4
Perhaps they've changed XGBoost serialization functionality between 0.4 and 0.6 versions.
Yes, iris_mapper
was not included in my example, but the exact one you suggested was in the notebook I used to produce the error. I will try with xgboost 0.4
The DataField
element for the "Species" column looks like this in the resulting PMML file:
<DataField name="Species" optype="categorical" dataType="double">
<Value value="1"/>
<Value value="2"/>
</DataField>
Target category names "1" and "2" are not so intuitive.
I just reproduced your result by using xgboost 0.4a30
version instead of 0.6
, and was able to successfully build the pmml file.
Hello, I am having a similar issue and am getting the following error if I use his example code. My Xgboost version is 0.4a30, is this something that will fix if I we upgrade the version of xgboost?
Sep 19, 2016 10:33:09 AM org.jpmml.sklearn.Main run [27/1867]
INFO: Converting Estimator..
Sep 19, 2016 10:33:09 AM org.jpmml.sklearn.Main run
SEVERE: Failed to convert Estimator
java.lang.RuntimeException: java.io.IOException
at xgboost.sklearn.Booster.loadLearner(Booster.java:53)
at xgboost.sklearn.Booster.getLearner(Booster.java:41)
at xgboost.sklearn.BoosterUtil.getNumberOfFeatures(BoosterUtil.java:35)
at xgboost.sklearn.XGBClassifier.getNumberOfFeatures(XGBClassifier.java:38)
at sklearn.Classifier.createSchema(Classifier.java:59)
at sklearn.EstimatorUtil.encodePMML(EstimatorUtil.java:47)
at org.jpmml.sklearn.Main.run(Main.java:189)
at org.jpmml.sklearn.Main.main(Main.java:107)
Caused by: java.io.IOException
at org.jpmml.xgboost.XGBoostDataInput.readReserved(XGBoostDataInput.java:68)
at org.jpmml.xgboost.GBTree.load(GBTree.java:61)
at org.jpmml.xgboost.Learner.load(Learner.java:92)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:34)
at xgboost.sklearn.Booster.loadLearner(Booster.java:51)
... 7 more
Exception in thread "main" java.lang.RuntimeException: java.io.IOException
at xgboost.sklearn.Booster.loadLearner(Booster.java:53)
at xgboost.sklearn.Booster.getLearner(Booster.java:41)
at xgboost.sklearn.BoosterUtil.getNumberOfFeatures(BoosterUtil.java:35)
at xgboost.sklearn.XGBClassifier.getNumberOfFeatures(XGBClassifier.java:38)
at sklearn.Classifier.createSchema(Classifier.java:59)
at sklearn.EstimatorUtil.encodePMML(EstimatorUtil.java:47)
at org.jpmml.sklearn.Main.run(Main.java:189)
at org.jpmml.sklearn.Main.main(Main.java:107)
Caused by: java.io.IOException
at org.jpmml.xgboost.XGBoostDataInput.readReserved(XGBoostDataInput.java:68)
at org.jpmml.xgboost.GBTree.load(GBTree.java:61)
at org.jpmml.xgboost.Learner.load(Learner.java:92)
at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:34)
at xgboost.sklearn.Booster.loadLearner(Booster.java:51)
... 7 more
I've tested both XGBoost 0.4
and 0.6
and I cannot reproduce this exception (ie. a java.io.IOException
that signals that Booster binary object contains non-zero bytes in the "reserved" area). Maybe it's a Architecture/OS issue (I'm on 64-bit GNU/Linux).
You would need to provide a Booster file that I could study locally.
I just got xgboost installed on OSX and it appeared to work. The server I am running the code on is CentOS/64-bit. I am happy to send the booster binary object, where are the located?
If you're using sklearn2pmml package version 0.9.7 or newer, then simply activate the debug option:
sklearn2pmml(estimator, mapper, debug = True)
The converter will then preserve temporary joblib dump files. Attach them here (or if GitHub won't let you do that for "security reasons", send to my e-mail).
I believe these are the files you want. Btw, thank you for responding so quickly!
With the update to jpmml-xgboost it looks like it works with the R script, and for a simple python version I made. It doesn't appear to work yet with XGBClassifier but I can just make a function to generate the feature map and use jpmml-xgboost. I will give it a try with my full size models. Thanks for the help!
Just a follow up, after I was able to get XGBoost to the current version (6.0) after building some new compliers everything worked without issues.
When using the
xgboost.XGBClassifer
wrapper, the estimator fails to convert. I get the error:My Version Info
Example