Open AbdealiLoKo opened 2 years ago
Moved this issue to its rightful project (the stack trace originates from the org.jpmml.python
package).
In short, the org.jpmml.python.ExpressionTranslator
component supports one-dimensional array indexing syntax (eg. X[$first_dim]
, but it does not support two- or higher-dimensional array indexing syntax (eg. X{$first_dim][$second_dim]
).
This is pretty much "by design", because the PMML language deals with scalar-type values, not collection- or array-type values.
The one-dimensional array indexing syntax is supported, because JPMML converters keep track of data frame columns automatically.
I'm not closing this feature request outright, because multi-dimensional array indexing support is foreseeable on longer timeframes (relevant both in JPMML-SkLearn and JPMML-SparkML projects).
The main requirement is that JPMML converters need to be supplied information about "extra dimensions" first.
For example, in case of SkLearn2PMML/JPMML-SkLearn this information could be conveyed in the form of a sklearn2pmml.decoration.ArrayDomain
decorator class. When the JPMML-SkLearn converter sees this pipeline step, then it updates the base feature definition accordingly. Next to ArrayDomain
(2D support) there could be MatrixDomain
for higher-dimensionality problems.
Something like this:
transformer = make_pipeline([
("decorator", ArrayDomain(second_axis = [..]),
("row_extractor", ExpressionTransformer("X[:][1]")
])
For starters, the JPMML-Converter project needs to define a specialized feature class (that the JPMML-Python expression translator component could use in this particular scenario).
Something like org.jpmml.converter.ArrayFeature
.
Hi, I have a scanrio where I need to use an array as a input column to my pipeline. I'd reduced a minimal example of the issue I'm having:
The above pipeline works fine in my jupyter notebook. But converting it to a PMML gives an error:
Gives the error: