MultiDomain Expression Transformer default value

jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML

GNU Affero General Public License v3.0

531 stars 117 forks source link

iris = load_iris() X = pd.DataFrame(iris.data, columns=iris.feature_names) X.iloc[0, 0] = None numeric_columns = X.columns y = iris.target numeric_mapper_domain = [ ( [numeric_column], ContinuousDomain(missing_value_treatment="as_value", invalid_value_treatment="as_missing", missing_value_replacement=-999) ) for numeric_column in numeric_columns ] # https://openscoring.io/blog/2020/02/23/sklearn_feature_specification_pmml/ numeric_mapper_domain.append( ( ['sepal length (cm)', 'sepal width (cm)'], [ MultiDomain([None, None]), Alias(ExpressionTransformer('24 * X[0]/(X[1]+0.0000001)'), 'R0_1') ] ) ) # Create a PMMLPipeline pmml_pipeline = PMMLPipeline( [ ("mapper", DataFrameMapper(numeric_mapper_domain)), ("classifier", DecisionTreeClassifier()) # as an example ] ) pmml_pipeline.target_fields = ["target"] pmml_pipeline.fit(X, y)

I followed the examples here : https://openscoring.io/blog/2020/02/23/sklearn_feature_specification_pmml

If you have a GitHub account, then you could ask your question(s) also in the blog's "feedback" section.

This particular issue would be a very good fit there - adding more explanations/code examples about a specific functionality.

Anyway, the primary intent of MultiDomain decorator is to allow you to perform decoration on a mixed list of categorical and continuous features. If you have only continuous features, then you can use good old ContinuousDomain as-is.

Please note that ContinuousDomain has multi-column support, whereas CategoricalDomain hasn't. If you need to feed multiple categorical features to an ExpressionTransformer, then you can bind/reorder elementary categorical decorators together using MultiDomain.

How can I control missing value / erroneous values in the ExpressionTransformer block

Domain decorator classes are about capturing the domain of input features. They are not intended for performing additional transformations (such as missing or invalid value replacement) on already transformed features.

You should check out ExpressionTransformer.map_missing_to and ExpressionTransformer.default_value attributes, which correspond to Apply@mapMissingTo and Apply@defaultValue attributes, respectively: https://dmg.org/pmml/v4-4-1/Functions.html#xsdElement_Apply

See the "Output table for Apply" sub-section on the referenced page.

I would like to be able to set missing_value_replacement on it

transformer = ExpressionTransformer('X[0] / (X[1] + 0.0000001)', map_missing_to = -1)

Your current expression "defends" against by division-by-zero errors by adding a small constant (0.0000001) to the denominator.

You can get rid of it, and map all division-by-zero errors to a specific error code:

transformer = ExpressionTransformer('X[0] / (X[1]', default_value = -2)

jpmml / jpmml-sklearn

MultiDomain Expression Transformer default value #189