jpmml / jpmml-xgboost

Java library and command-line application for converting XGBoost models to PMML
GNU Affero General Public License v3.0
128 stars 44 forks source link

Difference scores from the pmml version of the model than from XGBoost (0.4a30) #8

Closed damienrj closed 8 years ago

damienrj commented 8 years ago

Hello, I have been using jpmml-xgboost to convert models trained with XGboost 0.4a30 trained on a Centos server. The scores generated with the the boster's predict function bst.predict(xgeval, ntree_limit=0) are in some cases much different than those generated from the pmml version.

pmml
0.969313

python/xgboost
1.641659e-07

To get the model ready for conversion I create a feature map file, and save the model with bst.save_model(). Conversion happens without any errors.

Have any suggestion behind what could be causing the difference?

Thanks!

Also, I want to mention that using jpmml-sklearn I was able to convert a GBM model into pmml, and the scores that I got were the same from the sklearn model and my pmml implementation.

vruusmann commented 8 years ago

What's the use of the ntree_limit argument? The PMML representation evaluates all "member" decision tree models. It doesn't support subsetting member models (or other early stopping criteria) at the moment, although this is something that could be added (eg. replacing True segment selection predicates with <SimplePredicate name="ntree_limit" operator="lessOrEqual" value="..."/>).

I suspect there's something wrong with your feature map file. Does the prediction work correctly when you use Scikit-Learn's XGBRegressor and XGBClassifier estimator types, which do not need manual feature map specification. In other words, you should try the sklearn2pmml package for exporting XGBoost models.

Finally, what is the PMML evaluation engine that is misbehaving? Is it my JPMML-Evaluator library, or is it something else?

damienrj commented 8 years ago

Point 1, ntree_limit was in place because I was planning to use early_stopping. Currently it is set to it's default value of 0, and worse case I can just retrain the model with the right number of trees.

For point 3, we are using using your JPMML-Eavluator 1.3.1.

For point 2, I was using the jpmml-xgboost to convert the models because I still have errors converting XGBoost models that use XGBClassifier. jpmml-xgboost now converts the same model without issue if I don't use the XGBClassifier class.

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn2pmml.decoration import ContinuousDomain
from sklearn.linear_model import LogisticRegressionCV
from sklearn2pmml import sklearn2pmml
import xgboost as xgb
import pandas
import sklearn_pandas

params = {'n_estimators': 50, 'learning_rate': 1, 'seed':0, 'subsample': 0.8, 'colsample_bytree': 0.8, 
             'objective': 'binary:logistic', 'max_depth':4, 'min_child_weight':300, 'nthread': 50}

iris = load_iris()

iris_df = pandas.concat((pandas.DataFrame(iris.data[:, :], columns = ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"]), pandas.DataFrame(iris.target, columns = ["Species"])), axis = 1)

iris_mapper = sklearn_pandas.DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), PCA(n_components = 3)]),
    ("Species", None)
])

iris = iris_mapper.fit_transform(iris_df)

iris_X = iris[:, 0:3]
iris_y = iris[:, 3]

model = xgb.XGBClassifier(**params)
model.fit(iris_X, iris_y, verbose=True)
sklearn2pmml(model, iris_mapper, "test.pmml", with_repr = True, debug=True)

# iris_classifier = LogisticRegressionCV()
# iris_classifier.fit(iris_X, iris_y)

This gives the following error:

('python: ', '2.7.11')
('sklearn: ', '0.17.1')
('sklearn.externals.joblib:', '0.9.4')
('sklearn_pandas: ', '1.1.0')
('sklearn2pmml: ', '0.11.1')
java -cp /home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/guava-19.0.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/istack-commons-runtime-2.21.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jaxb-core-2.2.11.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jaxb-runtime-2.2.11.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jcommander-1.48.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-converter-1.1.0.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-sklearn-1.1.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-xgboost-1.1.0.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-agent-1.3.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-model-1.3.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-model-metro-1.3.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-schema-1.3.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pyrolite-4.13.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/serpent-1.12.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/slf4j-api-1.7.21.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.21.jar org.jpmml.sklearn.Main --pkl-estimator-input /data/tmp/damien/estimator-iEK2qc.pkl.z --repr-estimator XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=0.8,
       gamma=0, learning_rate=1, max_delta_step=0, max_depth=4,
       min_child_weight=300, missing=None, n_estimators=50, nthread=50,
       objective='multi:softprob', reg_alpha=0, reg_lambda=1,
       scale_pos_weight=1, seed=0, silent=True, subsample=0.8) --pkl-mapper-input /data/tmp/damien/mapper-nKUFWw.pkl.z --repr-mapper DataFrameMapper(features=[(['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width'], TransformerPipeline(steps=[('continuousdomain', ContinuousDomain(invalid_value_treatment='return_invalid')), ('pca', PCA(copy=True, n_components=3, whiten=False))])), ('Species', None)],
        sparse=False) --pmml-output test.pmml
('Preserved joblib dump file(s): ', '/data/tmp/damien/estimator-iEK2qc.pkl.z /data/tmp/damien/mapper-nKUFWw.pkl.z')
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-10-e99243315e4f> in <module>()
     28 model = xgb.XGBClassifier(**params)
     29 model.fit(iris_X, iris_y, verbose=True)
---> 30 sklearn2pmml(model, iris_mapper, "test.pmml", with_repr = True, debug=True)
     31 
     32 # iris_classifier = LogisticRegressionCV()

/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/__init__.pyc in sklearn2pmml(estimator, mapper, pmml, with_repr, debug)
     63                 if(debug):
     64                         print(" ".join(cmd))
---> 65                 subprocess.check_call(cmd)
     66         finally:
     67                 if(debug):

/home/damien/python/lib/python2.7/subprocess.pyc in check_call(*popenargs, **kwargs)
    538         if cmd is None:
    539             cmd = popenargs[0]
--> 540         raise CalledProcessError(retcode, cmd)
    541     return 0
    542 

CalledProcessError: Command '['java', '-cp', '/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/guava-19.0.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/istack-commons-runtime-2.21.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jaxb-core-2.2.11.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jaxb-runtime-2.2.11.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jcommander-1.48.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-converter-1.1.0.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-sklearn-1.1.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-xgboost-1.1.0.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-agent-1.3.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-model-1.3.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-model-metro-1.3.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-schema-1.3.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pyrolite-4.13.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/serpent-1.12.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/slf4j-api-1.7.21.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.21.jar', 'org.jpmml.sklearn.Main', '--pkl-estimator-input', '/data/tmp/damien/estimator-iEK2qc.pkl.z', '--repr-estimator', "XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=0.8,\n       gamma=0, learning_rate=1, max_delta_step=0, max_depth=4,\n       min_child_weight=300, missing=None, n_estimators=50, nthread=50,\n       objective='multi:softprob', reg_alpha=0, reg_lambda=1,\n       scale_pos_weight=1, seed=0, silent=True, subsample=0.8)", '--pkl-mapper-input', '/data/tmp/damien/mapper-nKUFWw.pkl.z', '--repr-mapper', "DataFrameMapper(features=[(['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width'], TransformerPipeline(steps=[('continuousdomain', ContinuousDomain(invalid_value_treatment='return_invalid')), ('pca', PCA(copy=True, n_components=3, whiten=False))])), ('Species', None)],\n        sparse=False)", '--pmml-output', 'test.pmml']' returned non-zero exit status 1

output.zip

Some difference I found between a SKlearn GBM and the PMML are:

SKlearn
        <DataField name="label" optype="categorical" dataType="integer">
            <Value value="0"/>
            <Value value="1"/>
        </DataField>

                    <Output>
                        <OutputField name="probability_0" feature="probability" value="0"/>
                        <OutputField name="probability_1" feature="probability" value="1"/>
                    </Output>

XGBoost

        <DataField name="label" optype="categorical" dataType="string">
            <Value value="0"/>
            <Value value="1"/>
        </DataField>

                    <Output>
                        <OutputField name="probability_0" optype="continuous" dataType="double" feature="probability" value="0"/>
                        <OutputField name="probability_1" optype="continuous" dataType="double" feature="probability" value="1"/>
                    </Output>
vruusmann commented 8 years ago

('sklearn2pmml: ', '0.11.1')

Please upgrade to sklearn2pmml version 0.12.0! The last couple of releases (0.11.2 and 0.12.0) were specifically about ensuring compatibility with XGBoost 0.6.

damienrj commented 8 years ago

I still get the same error with 0.12.0. Also, we are using XGBoost (0.4a30) not 6.0 because we currently can't install 6.0 due to the complier that it needs is not supported by our version of CentOS. I have provided the updated files generated using 0.12.0

('python: ', '2.7.11')
('sklearn: ', '0.17.1')
('sklearn.externals.joblib:', '0.9.4')
('sklearn_pandas: ', '1.1.0')
('sklearn2pmml: ', '0.12.0')
java -cp /home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/guava-19.0.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/istack-commons-runtime-2.21.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jaxb-core-2.2.11.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jaxb-runtime-2.2.11.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jcommander-1.48.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-converter-1.1.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-sklearn-1.1.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-xgboost-1.1.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-agent-1.3.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-model-1.3.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-model-metro-1.3.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-schema-1.3.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pyrolite-4.13.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/serpent-1.12.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/slf4j-api-1.7.21.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.21.jar org.jpmml.sklearn.Main --pkl-estimator-input /data/tmp/damien/estimator-jceJav.pkl.z --repr-estimator XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=0.8,
       gamma=0, learning_rate=1, max_delta_step=0, max_depth=4,
       min_child_weight=300, missing=None, n_estimators=50, nthread=50,
       objective='multi:softprob', reg_alpha=0, reg_lambda=1,
       scale_pos_weight=1, seed=0, silent=True, subsample=0.8) --pkl-mapper-input /data/tmp/damien/mapper-5hehV5.pkl.z --repr-mapper DataFrameMapper(features=[(['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width'], TransformerPipeline(steps=[('continuousdomain', ContinuousDomain(invalid_value_treatment='return_invalid')), ('pca', PCA(copy=True, n_components=3, whiten=False))])), ('Species', None)],
        sparse=False) --pmml-output test.pmml
('Preserved joblib dump file(s): ', '/data/tmp/damien/estimator-jceJav.pkl.z /data/tmp/damien/mapper-5hehV5.pkl.z')
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-1-e99243315e4f> in <module>()
     28 model = xgb.XGBClassifier(**params)
     29 model.fit(iris_X, iris_y, verbose=True)
---> 30 sklearn2pmml(model, iris_mapper, "test.pmml", with_repr = True, debug=True)
     31 
     32 # iris_classifier = LogisticRegressionCV()

/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/__init__.pyc in sklearn2pmml(estimator, mapper, pmml, with_repr, debug)
     63                 if(debug):
     64                         print(" ".join(cmd))
---> 65                 subprocess.check_call(cmd)
     66         finally:
     67                 if(debug):

/home/damien/python/lib/python2.7/subprocess.pyc in check_call(*popenargs, **kwargs)
    538         if cmd is None:
    539             cmd = popenargs[0]
--> 540         raise CalledProcessError(retcode, cmd)
    541     return 0
    542 

CalledProcessError: Command '['java', '-cp', '/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/guava-19.0.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/istack-commons-runtime-2.21.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jaxb-core-2.2.11.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jaxb-runtime-2.2.11.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jcommander-1.48.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-converter-1.1.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-sklearn-1.1.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/jpmml-xgboost-1.1.1.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-agent-1.3.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-model-1.3.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-model-metro-1.3.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pmml-schema-1.3.3.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/pyrolite-4.13.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/serpent-1.12.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/slf4j-api-1.7.21.jar:/home/damien/.local/lib/python2.7/site-packages/sklearn2pmml/resources/slf4j-jdk14-1.7.21.jar', 'org.jpmml.sklearn.Main', '--pkl-estimator-input', '/data/tmp/damien/estimator-jceJav.pkl.z', '--repr-estimator', "XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=0.8,\n       gamma=0, learning_rate=1, max_delta_step=0, max_depth=4,\n       min_child_weight=300, missing=None, n_estimators=50, nthread=50,\n       objective='multi:softprob', reg_alpha=0, reg_lambda=1,\n       scale_pos_weight=1, seed=0, silent=True, subsample=0.8)", '--pkl-mapper-input', '/data/tmp/damien/mapper-5hehV5.pkl.z
', '--repr-mapper', "DataFrameMapper(features=[(['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width'], TransformerPipeline(steps=[('continuousdomain', ContinuousDomain(invalid_value_treatment='return_invalid')), ('pca', PCA(copy=True, n_components=3, whiten=False))])), ('Species', None)],\n        sparse=False)", '--pmml-output', 'test.pmml']' returned non-zero exit status 1
Oct 14, 2016 2:52:49 PM org.jpmml.sklearn.Main run
INFO: Parsing DataFrameMapper PKL..
Oct 14, 2016 2:52:49 PM org.jpmml.sklearn.Main run
INFO: Parsed DataFrameMapper PKL in 42 ms.
Oct 14, 2016 2:52:49 PM org.jpmml.sklearn.Main run
INFO: Converting DataFrameMapper..
Oct 14, 2016 2:52:49 PM org.jpmml.sklearn.Main run
INFO: Converted DataFrameMapper in 26 ms.
Oct 14, 2016 2:52:49 PM org.jpmml.sklearn.Main run
INFO: Parsing Estimator PKL..
Oct 14, 2016 2:52:49 PM org.jpmml.sklearn.Main run
INFO: Parsed Estimator PKL in 7 ms.
Oct 14, 2016 2:52:49 PM org.jpmml.sklearn.Main run
INFO: Converting Estimator..
Oct 14, 2016 2:52:49 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert Estimator
java.lang.RuntimeException: java.io.IOException
        at xgboost.sklearn.Booster.loadLearner(Booster.java:53)
        at xgboost.sklearn.Booster.getLearner(Booster.java:41)
        at xgboost.sklearn.BoosterUtil.getNumberOfFeatures(BoosterUtil.java:35)
        at xgboost.sklearn.XGBClassifier.getNumberOfFeatures(XGBClassifier.java:38)
        at sklearn.Classifier.createSchema(Classifier.java:59)
        at sklearn.EstimatorUtil.encodePMML(EstimatorUtil.java:47)
        at org.jpmml.sklearn.Main.run(Main.java:189)
        at org.jpmml.sklearn.Main.main(Main.java:107)
Caused by: java.io.IOException
        at org.jpmml.xgboost.XGBoostDataInput.readReserved(XGBoostDataInput.java:82)
        at org.jpmml.xgboost.GBTree.load(GBTree.java:61)
        at org.jpmml.xgboost.Learner.load(Learner.java:98)
        at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:34)
        at xgboost.sklearn.Booster.loadLearner(Booster.java:51)
        ... 7 more

Exception in thread "main" java.lang.RuntimeException: java.io.IOException
        at xgboost.sklearn.Booster.loadLearner(Booster.java:53)
        at xgboost.sklearn.Booster.getLearner(Booster.java:41)
        at xgboost.sklearn.BoosterUtil.getNumberOfFeatures(BoosterUtil.java:35)
        at xgboost.sklearn.XGBClassifier.getNumberOfFeatures(XGBClassifier.java:38)
        at sklearn.Classifier.createSchema(Classifier.java:59)
        at sklearn.EstimatorUtil.encodePMML(EstimatorUtil.java:47)
        at org.jpmml.sklearn.Main.run(Main.java:189)
        at org.jpmml.sklearn.Main.main(Main.java:107)
Caused by: java.io.IOException
        at org.jpmml.xgboost.XGBoostDataInput.readReserved(XGBoostDataInput.java:82)
        at org.jpmml.xgboost.GBTree.load(GBTree.java:61)
        at org.jpmml.xgboost.Learner.load(Learner.java:98)
        at org.jpmml.xgboost.XGBoostUtil.loadLearner(XGBoostUtil.java:34)
        at xgboost.sklearn.Booster.loadLearner(Booster.java:51)
        ... 7 more

output_0.12.0.zip

vruusmann commented 8 years ago

Your example Python script runs just fine in my Python 2.7 environment:

('python: ', '2.7.11')
('sklearn: ', '0.18')
('sklearn.externals.joblib:', '0.10.2')
('sklearn_pandas: ', '1.1.0')
('sklearn2pmml: ', '0.12.0')

I'm using xgboost-0.6a2 that was downloaded minutes ago:

pip2.7 install --upgrade xgboost

What is the version of your Scikit-Learn's XGBoost package?

damienrj commented 8 years ago

How do I check the Scikit-Learn's XGBoost package I tried both scikit-learn 17, and 18? Or do you mean the jpmml-sklearn package? I believe that is 1.1.

Also, would it be an issue if I was using weights in the XGBoost dmatrix?

vruusmann commented 8 years ago

Printing the version of Scikit-Learn's XGBoost package:

import xgboost
print(xgboost.__version__)

You can use row weights if you want to. Weights are used during model training; they are not part of the "persistent state" of the model, and therefore they are not transferred over to the PMML representation of the model.

However, on a practical note, I would advise you to position the weights column as the last column of XGBoost dmatrix. It may well be the case that the weight column is shifting data columns in your feature map specification file, which leads to incorrect PMML conversion results.

vruusmann commented 8 years ago

As for the following java.io.IOException, then this is something that I cannot/will not fix:

Caused by: java.io.IOException
        at org.jpmml.xgboost.XGBoostDataInput.readReserved(XGBoostDataInput.java:82)
        at org.jpmml.xgboost.GBTree.load(GBTree.java:61)

Per the latest XGBoost source code, there has to be a 32-element array of zero bytes in that location: https://github.com/dmlc/xgboost/blob/master/src/gbm/gbtree.cc#L107

If this assumption does not hold, then there's something wrong with your XGBoost installation. XGBoost source code suggests that it might be some sort of 32-bit/64-bit compatibility issue: https://github.com/dmlc/xgboost/blob/master/src/gbm/gbtree.cc#L111

damienrj commented 8 years ago

Thanks for the help, I agree it seems like there isn't much you can do with regards to the java.io.IOException. I will check if moving the weights to the end, or removing all together fixed the issue.

damienrj commented 8 years ago

Okay, it does look like there is something going on with the server, maybe the 32-bit/64-bit compatibility issue you mentioned. On the bright side I was able to get the scores validated when running on my laptop, just need to figure out the source of the problem with the server.

vruusmann commented 8 years ago

There's another issue about R vs. PMML mismatch: https://github.com/jpmml/jpmml-xgboost/issues/9

The above issue relates to the use of missing argument with the xgboost() function call. By any chance, does your use case include custom missing value indicators?

damienrj commented 8 years ago

I was using the DMatrix to fill in values, but thought it might cause a problem so was filling in missing values with zeroes before I passed the data into the DMatrix.

xgtrain = xgb.DMatrix(train[clf.signal_names].values, label=train['label'].values, feature_names=clf.signal_names, weight=train.precision_weight)

xgtest = xgb.DMatrix(testing[clf.signal_names].values, label=testing['label'].values, feature_names=clf.signal_names, weight=train.precision_weight)

xgeval = xgb.DMatrix(eval[clf.signal_names].values, label=eval['label'].values, feature_names=clf.signal_names, weight=eval.precision_weight)```