Support for bitwise logical operators

mkuiack commented 3 years ago

I'm writing a feature generation preprocessing step which checks if a value is close to being a round factor of 10, using the ExpressionTransformer

feature_5 = DataFrameMapper([
    (["Amount_Usd"],
     [Alias(ExpressionTransformer("1 if ((X[0] % 10.) <= 0.1) | ((X[0] % 10.) >= 9.9) else 0"), 
            name="feature_5", prefit=True)],)
])

feature_5

DataFrameMapper(drop_cols=[],
                features=[(['Amount_Usd'],
                           [Alias(name='feature_5', prefit=True,
                                  transformer=ExpressionTransformer(expr='1 if '
                                                                         '((X[0] '
                                                                         '% '
                                                                         '10.) '
                                                                         '<= '
                                                                         '0.1) '
                                                                         '| '
                                                                         '((X[0] '
                                                                         '% '
                                                                         '10.) '
                                                                         '>= '
                                                                         '9.9) '
                                                                         'else '
                                                                         '0'))])])

The expression is valid python, but there seems to be an issue with the translation of | to LogicalOr?

> SEVERE: Failed to convert PKL to PMML
org.jpmml.python.TokenMgrException: Lexical error at line 1, column 28.  Encountered: "|" (124), after : ""
    at org.jpmml.python.ExpressionTranslatorTokenManager.getNextToken(ExpressionTranslatorTokenManager.java:619)
    at org.jpmml.python.ExpressionTranslator.jj_scan_token(ExpressionTranslator.java:1967)
    at org.jpmml.python.ExpressionTranslator.jj_3R_TrailerFunctionInvocationExpression_745_9_59(ExpressionTranslator.java:1182)
    at org.jpmml.python.ExpressionTranslator.jj_3R_PrimaryExpression_634_58_44(ExpressionTranslator.java:1386)
    at org.jpmml.python.ExpressionTranslator.jj_3R_PrimaryExpression_634_58_35(ExpressionTranslator.java:1377)
    at org.jpmml.python.ExpressionTranslator.jj_3R_PrimaryExpression_634_17_28(ExpressionTranslator.java:1581)
    at org.jpmml.python.ExpressionTranslator.jj_3R_PrimaryExpression_623_9_26(ExpressionTranslator.java:1633)
    at org.jpmml.python.ExpressionTranslator.jj_3R_UnaryExpression_605_17_25(ExpressionTranslator.java:1651)
    at org.jpmml.python.ExpressionTranslator.jj_3R_UnaryExpression_600_9_20(ExpressionTranslator.java:1697)
    at org.jpmml.python.ExpressionTranslator.jj_3R_MultiplicativeExpression_587_9_15(ExpressionTranslator.java:1732)
    at org.jpmml.python.ExpressionTranslator.jj_3R_AdditiveExpression_563_9_12(ExpressionTranslator.java:1124)
    at org.jpmml.python.ExpressionTranslator.jj_3_1(ExpressionTranslator.java:1174)
    at org.jpmml.python.ExpressionTranslator.jj_2_1(ExpressionTranslator.java:1069)
    at org.jpmml.python.ExpressionTranslator.ComparisonExpression(ExpressionTranslator.java:398)
    at org.jpmml.python.ExpressionTranslator.NegationExpression(ExpressionTranslator.java:387)
    at org.jpmml.python.ExpressionTranslator.LogicalAndExpression(ExpressionTranslator.java:357)
    at org.jpmml.python.ExpressionTranslator.LogicalOrExpression(ExpressionTranslator.java:336)
    at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:317)
    at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:310)
    at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:321)
    at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:310)
    at org.jpmml.python.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:304)
    at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:34)
    at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:23)
    at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:52)
    at sklearn2pmml.decoration.Alias.encodeFeatures(Alias.java:56)
    at sklearn.Transformer.encode(Transformer.java:70)
    at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:73)
    at sklearn.Initializer.encodeFeatures(Initializer.java:48)
    at sklearn.Transformer.encode(Transformer.java:70)
    at sklearn.pipeline.FeatureUnion.encodeFeatures(FeatureUnion.java:45)
    at sklearn.Transformer.encode(Transformer.java:70)
    at sklearn.Composite.encodeFeatures(Composite.java:119)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:211)
    at org.jpmml.sklearn.Main.run(Main.java:226)
    at org.jpmml.sklearn.Main.main(Main.java:143)

mkuiack commented 3 years ago

I've changing the expression to

1 if ((X[0] % 10.) <= 0.1) or ((X[0] % 10.) >= 9.9) else 0

This solved the issue with sklearn2pmml, but in python this expression would return

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

vruusmann commented 3 years ago

Moved this issue to the JPMML-Python project, because this is where the expression translator component actually resides.

The expression is valid python, but there seems to be an issue with the translation of | to LogicalOr?

For simplicity's sake, the grammar only defines boolean comparison operators 'and', 'or' and 'not`: https://github.com/jpmml/jpmml-python/blob/master/src/main/javacc/expression.jj#L350-L354

Haven't considered bitwise comparison operators, because I'm not entirely sure that they're functionally equivalent to boolean comparison operators. Also, it needs to be verified that they are functionally compatible with PMML built-in functions and, or and not.

IIRC, PMML built-in function and behaves more like the && operator (terms are evaluated lazily from left to right), and less like the & operator (all terms are evaluated eagerly, even if some term is already known to have returned a false/0 value).

vruusmann commented 3 years ago

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This error is raised by the Numpy's apply method: https://github.com/jpmml/sklearn2pmml/blob/0.74.4/sklearn2pmml/util/__init__.py#L18

Can you rewrite the sklearn2pmml.util.eval_rows() function so that boolean comparison operators and and or would be tolerated? https://github.com/jpmml/sklearn2pmml/blob/0.74.4/sklearn2pmml/util/__init__.py#L16-L23

Pardon my limited Python skills.

vruusmann commented 3 years ago

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Here's an StackOverflow answer, which suggests using numpy.logical_and and numpy.logical_or functions (as a replacement for and and or operators, respectively): https://stackoverflow.com/a/45380236/

Looks like bitwise operators are verboten, but you could do:

import numpy

transformer = ExpressionTransformer("1 if numpy.logical_or((X[0] % 10.) <= 0.1, (X[0] % 10.) >= 9.9) else 0")

mkuiack commented 3 years ago

I can have a look, but my skills are also limited 😄

How would the numpy.logical_or be interpreted if the pmml is deployed on a system which doesn't have python or numpy installed?

vruusmann commented 3 years ago

How would the numpy.logical_or be interpreted if the pmml is deployed on a system which doesn't have python or numpy installed?

There is no deployment-time Numpy dependency whatsoever, because Numpy logical functions are translated to PMML built-in functions: http://dmg.org/pmml/v4-4/BuiltinFunctions.html#boolean3

The updated JPMML-Python library is included in the latest SkLearn2PMML version 0.75.0 (released this morning). Please upgrade, and see if your issue has been solved.

mkuiack commented 2 years ago

Upgrading to 0.75 and using numpy.logical_and solves the issue.

jpmml / jpmml-python

Support for bitwise logical operators #14