Closed xianlin666 closed 1 year ago
train code
val xgboost = new XGBoostRegressor() .setMissing(0.0F)
It's probably the XGBoostRegressor.missing
property that is causing this - it's not converted automatically.
Open your PMML file in text editor and check if all continuous field declarations contain a DataField/Value
child element like shown below:
<DataField name="myfield">
<Value value="0.0" property="missing"/>
</DataField>
If they are missing, then you might try adding them manually, and re-run the prediction - the results should be correct now.
What is your Apache Spark, and JPMML-SparkML versions?
I refactored XGBoost missing value handling in JPMML-XGBoost 1.7.1: https://github.com/jpmml/jpmml-xgboost/commit/57192fb9835af9cf9fd8974034afcf76fc107d17
The newly introduced org.jpmml.xgboost.HasXGBoostOptions#OPTION_MISSING
conversion option is not integrated into the JPMML-SparkML library yet.
But I have not find any element like missing in my PMML file. Here only "missingValueStrategy" in Segement like this, will it influence the result?
<Segment id="11">
<True/>
<TreeModel functionName="regression" missingValueStrategy="defaultChild" splitCharacteristic="binarySplit" x-mathContext="float">
<MiningSchema>
<MiningField name="float(Min_value_sent)"/>
and My spark version is 2.4.8, and jpmml-sparkml is 1.5.14.
But I have not find any element like missing in my PMML file.
If there are no DataField/Value@property="missing"
elements in your PMML document, then it means that the (J)PMML evaluator is not instructed to "re-classify" the 0.0
value from the valid value space to the missing value space. Sounds logical, no?
Here only "missingValueStrategy" in Segement like this, will it influence the result?
The TreeModel@missingValueStrategy
insttucts what to do about a missing model prediction. It does not interact with model inputs (aka features) in any way.
My spark version is 2.4.8, and jpmml-sparkml is 1.5.14.
That's a really old version, which is no longer supported/maintained by me.
When I implement a fix for this issue, then you need to back-port it to the 1.5.X branch manually.
train code
val xgboost = new XGBoostRegressor() .setMissing(0.0F)
In the meantime, it should be possible to make the (J)PMML prediction come out correct if you replace 0.0
values with Double.NaN
values in your test set.
Something like:
test_df = test_df.replace(0.0d, Double.NaN);
The DataField/Value
element would do this automatically inside the model, but since it's currently unavailable for you, you could do it manually outside of the model.
Thanks for your patient answering, I have solved this by adding <Value value="0.0" property="missing"/>
into my PMML file manually. Now i can get the expected result by importing PMML model.
I used scala train a XGB model, and got a result before export pmml model by predicting test data, and then I import pmml model to predict by the same test data , I got totally diffrent result. I dont know which kind of situation will lead to this. train code:
import predict: