Closed pchitimi closed 2 weeks ago
I am seeing the following issue when attempting to export a model Pipeline to PMML 4.3.
Exception in thread "main" java.lang.UnsupportedOperationException at org.dmg.pmml.Version$1.getVersion(Version.java:23)
This exception is thrown by the Version#XPMML
special enum constant:
https://github.com/jpmml/jpmml-model/blob/1.6.5/pmml-model/src/main/java/org/dmg/pmml/Version.java#L16-L25
It means that your model may be PMML 4.3 compatible, but it "contains" some vendor extensions - a XML markup (typically, some XML attribute) which is not part of the PMML specification.
Anyway, the good news is that if you are using JPMML converters (one the Python ML side), and JPMML evaluators (on the Java application side), then this vendor extension is likely to be recognized/supported in both PMML 4.3 and 4.4 modes.
The SkLearn2PMML package should contain special logic for dealing with vendor extensions. The Version#XPMML
enum constant is not a standalone PMML version per se. It's more like a "mask" on top of some valid PMML version such as PMML 4.3 or 4.4 (to be interpreted as "PMML 4.3 with some JPMML-specific attributes").
Now, thinking about this issue, then I can think of the following improvements:
Version#XPMML
enum constant needs special handling. It should never cause the version downgrade to fail. At most, it may cause some warning messages to be omitted (eg. "The generated document is PMML schema version $major.$minor compatible, but contains such-and-such JPMML vendor extensions").org.jpmml.model.visitors.VersionInspector
class) simply gives a "yes, all good" or "no, something is not right" binary answer, which is not sufficient.@pchitimi What you can try right now to clarify the situation: export your model using the default (ie. latest) PMML schema version, and open it in a text editor; then, search for XML element and attributes whose name starts with "x-" (letter "X" followed by hypen). How many/which can you find?
If it's only or two pieces of markup, we can verify them together, and you can then proceed to perform the version downgrade manually - by editing the XML namespace declaration.
Thinking about this issue, then I can think of the following improvements:
Also, perhaps the version downgrade functionality should be available as a separate SkLearn2PMML utility function.
This functionality does (potentially-) have many controlling options. Adding them to the main sklearn2pmml.sklearn2pmml
utility function as extra parameters would complicate the situation too much.
Thank you very much for the detailed response including the potential improvement paths @vruusmann!
As per your guidance, I was able to identify 4 unique (97 total) XML element/attributes whose name starts with "x-":
<MiningModel functionName="regression" x-mathContext="float">
<MiningModel functionName="classification" algorithmName="XGBoost (GBTree)" x-mathContext="float">
<RegressionModel functionName="classification" normalizationMethod="logit" x-mathContext="float">
<TreeModel functionName="regression" noTrueChildStrategy="returnLastPrediction" x-mathContext="float">
I was able to identify 4 unique XML element/attributes whose name starts with "x-"
They are all <Model>@x-mathContext
attributes, which instruct the JPMML evaluator to carry out all model-internal computations using 32-bit floating point data type/math operations (the default would be 64-bit).
Fundamentally, this particular attribute can be omitted without breaking the underlying model (the predicted results will come out with extra precision, which qualifies as "noise"). It is a very ancient vendor extension, which should be recognized by all JPMML-Evaluator 1.4.X and newer versions.
Anyway, my expectation is that the SkLearn2PMML package should never fail because of the <Model>@x-mathContext
attribute.
The trouble is that this attribute is always present for XGBoost models.
Gotcha, just to make sure I understand:
<Model>@x-mathContext
attributes without significant changes to the model (only a precision change)Is my understanding correct or did I miss anything?
Is my understanding correct or did I miss anything?
Yes, these two changes should achieve the "PMML schema version downgrade" from 4.4 to 4.3 for XGBoost models.
For comparison, you may train a toy LightGBM model (structurally very similar to XGBoost models), and do the following:
pmml_schema = None
)pmml_schema = "4.3"
)Then diff these two files (eg. using the command-line diff
tool) - you will see exactly what was changed, line by line. Should the the XML namespace URL, and the PMML@version
attribute values (the latter being non-critical).
LightGBM models don't need the <Model>@x-mathContext
attribute, so the conversion should succeed every time.
I've just released SkLearn2PMML 0.111.1 to PyPI, which permits the PMML schema version downgrade to proceed even if there are incompatibilities around.
The full list of incompatibilities are printed to the console; the person performing the conversion can review and correct them manually if necessary.
For example, the console log when converting/downgrading an example XGBAudit
model to PMML 4.3:
SEVERE: The PMML object has 2 incompatibilities with the requested PMML schema version:
WARNING: Attribute with value Segmentation@missingPredictionTreatment=returnMissing is not supported (2 cases)
The above log shows that there are two Segmentation@missingPredictionTreatment
attributes around, which are not part of PMML 4.3 (they were introduced in PMML 4.4). However, this qualifies as a "ignorable" incompatibility, because the presence/absence of this attribute does not change the actual prediction logic - it's a modifier that instructs the predictor to safely/cleanly terminate the prediction process if the first stage of the XGBoost model yielded a missing result (due to one or more missing input values).
Hello! I am seeing the following issue when attempting to export a model Pipeline to PMML 4.3. I am uncertain if the model requires at least 4.4 or if there are other issues at play here.
Using the debug flag, the output I observe is as follows:
Thank you for your assistance!