Closed nhawrylyshyn closed 10 months ago
I followed the examples here : https://openscoring.io/blog/2020/02/23/sklearn_feature_specification_pmml
If you have a GitHub account, then you could ask your question(s) also in the blog's "feedback" section.
This particular issue would be a very good fit there - adding more explanations/code examples about a specific functionality.
Anyway, the primary intent of MultiDomain
decorator is to allow you to perform decoration on a mixed list of categorical and continuous features. If you have only continuous features, then you can use good old ContinuousDomain
as-is.
Please note that ContinuousDomain
has multi-column support, whereas CategoricalDomain
hasn't. If you need to feed multiple categorical features to an ExpressionTransformer
, then you can bind/reorder elementary categorical decorators together using MultiDomain
.
How can I control missing value / erroneous values in the ExpressionTransformer block
Domain decorator classes are about capturing the domain of input features. They are not intended for performing additional transformations (such as missing or invalid value replacement) on already transformed features.
You should check out ExpressionTransformer.map_missing_to
and ExpressionTransformer.default_value
attributes, which correspond to Apply@mapMissingTo
and Apply@defaultValue
attributes, respectively:
https://dmg.org/pmml/v4-4-1/Functions.html#xsdElement_Apply
See the "Output table for Apply" sub-section on the referenced page.
I would like to be able to set missing_value_replacement on it
transformer = ExpressionTransformer('X[0] / (X[1] + 0.0000001)', map_missing_to = -1)
Your current expression "defends" against by division-by-zero errors by adding a small constant (0.0000001) to the denominator.
You can get rid of it, and map all division-by-zero errors to a specific error code:
transformer = ExpressionTransformer('X[0] / (X[1]', default_value = -2)
Hi I followed the examples here : https://openscoring.io/blog/2020/02/23/sklearn_feature_specification_pmml to create 4 continuous domain features and a 5th feature which was randomly picked as an expression to be the ratio of the first and second column. Things work when all values are well defined. However when I modify the dataset to have None or undefined values the ExpressionTransformer fails "ValueError: Input contains NaN, infinity or a value too large for dtype('float32')." (example given).
How can I control missing value / erroneous values in the ExpressionTransformer block i.e. I would like either the missing value replacement from the numeric domain mapper to be applied or to be able to set missing_value_replacement on it ? Is this possible ?
Thank you for help.
-NH