jpmml / sklearn2pmml

Python library for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
684 stars 113 forks source link

PMML target output from float to double precision #362

Closed AraceliAL closed 1 year ago

AraceliAL commented 1 year ago

Good afternoon!

I was wondering if theres a way to tune the dataType return by the PMML from float to double. The thing is that I am training an XGBoost, that works with floats, therefore the output is returned as float as well. Nonetheless, I wanted to know if there's any change that something like this can be modified in the PMML.

<OutputField name="Probability_0" optype="continuous" dataType="float" feature="probability" value="0"/>

Thanks a lot!!

vruusmann commented 1 year ago

The thing is that I am training an XGBoost, that works with floats, therefore the output is returned as float as well.

The core idea of (J)PMML is to reproduce the behaviour of the original ML framework as closely as possible.

So, if the original is doing all calculations as float, and returning predicted values also as float, then (J)PMML will do exactly the same. Returning predicted values as double would be a functional defect, IMHO.

I wanted to know if there's any change that something like this can be modified in the PMML.

There is no API for doing this at the SkLearn2PMML package level. But you could do this easily yourself at the JPMML-Model library level (read file into memory, update the object, write from memory back to file).

Or if Java is not your thing, you could use something very basic such as command-line text find/replace (eg. the sed tool).

AraceliAL commented 1 year ago

Thanks a lot! :)