onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
538 stars 99 forks source link

FunctionTransformer is not supported #609

Open HamzaSaouli opened 3 years ago

HamzaSaouli commented 3 years ago

RuntimeError Traceback (most recent call last)

in 1 # Export the model 2 initial_type = [('numfeat', FloatTensorType([None, 30]))] ----> 3 model_onnx = convert_sklearn(model, initial_types=initial_type) 4 5 # Save it into wanted file ~/.local/lib/python3.6/site-packages/skl2onnx/convert.py in convert_sklearn(model, name, initial_types, doc_string, target_opset, custom_conversion_functions, custom_shape_calculators, custom_parsers, options, dtype, intermediate, white_op, black_op, final_types) 148 149 # Infer variable shapes --> 150 topology.compile() 151 152 # Convert our Topology object into ONNX. The outcome is an ONNX model. ~/.local/lib/python3.6/site-packages/skl2onnx/common/_topology.py in compile(self) 901 self._resolve_duplicates() 902 self._fix_shapes() --> 903 self._infer_all_types() 904 self._check_structure() 905 ~/.local/lib/python3.6/site-packages/skl2onnx/common/_topology.py in _infer_all_types(self) 754 shape_calc(operator) 755 else: --> 756 operator.infer_types() 757 758 def _resolve_duplicates(self): ~/.local/lib/python3.6/site-packages/skl2onnx/common/_topology.py in infer_types(self) 220 "Unable to find a shape calculator for alias '{}' " 221 "and type '{}'.".format(self.type, type(self.raw_operator))) --> 222 shape_calc(self) 223 224 @property ~/.local/lib/python3.6/site-packages/skl2onnx/shape_calculators/function_transformer.py in calculate_sklearn_function_transformer_output_shapes(operator) 15 """ 16 if operator.raw_operator.func is not None: ---> 17 raise RuntimeError("FunctionTransformer is not supported unless the " 18 "transform function is None (= identity). " 19 "You may raise an issue at " RuntimeError: FunctionTransformer is not supported unless the transform function is None (= identity). You may raise an issue at https://github.com/onnx/sklearn-onnx/issues.
xadupre commented 3 years ago

FunctionTransformer includes custom codes which is difficult to automatically convert into ONNX. It should be easier if the custom function could be directly written with onnx operators. That's one option : write the custom function with ONNX operators. The second option is to convert that function into a python operator onnxruntime can use. That's what is done by the package ort-customops. Both ways would probably require examples to guide users.

xadupre commented 3 years ago

I explored a more simple way to do it with a syntax very close to numpy, see Numpy API for ONNX and scikit-learn .

paranjapeved15 commented 10 months ago

@xadupre I checked your post but I am not sure I understand. The problem is that onnx converters do not accept FunctionTransformer. Can you please elaborate how your example solution would solve the issue?

xadupre commented 10 months ago

skl2onnx converts a scikit-learn pipeline into onnx. It means for many estimators, skl2onnx has an onnx implementation of the inference function implemented in scikit-learn. The final onnx graph is just putting all these blocks together. FunctionTransformer includes a piece of code skl2onnx has no implementation for and one piece is missing unless the user provides that implementation.

Now how to provide that onnx implementation of your custom code. That's the main difficult part. The blog post you mention uses a package I don't maintain anymore as I broke it into smaller packages. Some parts were added to onnx. I tried to make a list of the options available today: Many ways to implement a custom graph in ONNX. In your case, the first choice is to select the option which fits your need.

Once you have an onnx representation of your python code, I recently added function add_onnx_graph in PR https://github.com/onnx/sklearn-onnx/pull/1023 which integrates any onnx graph into your pipeline. It is not released yet but skl2onnx can be installed from github to use it.

paranjapeved15 commented 10 months ago

Thanks for the reply @xadupre! So if I rewrite my custom function to use onnx operators instead of numpy and then call my function using FunctionTransformer, would that work? Or would I need to add my onnx operator function directly to the onnx graph using add_onnx_graph?

paranjapeved15 commented 10 months ago

@xadupre, because FunctionTransformer was giving errors I was also trying to create a custom transformer and then write onnx converter for it, similar to - https://onnx.ai/sklearn-onnx/auto_tutorial/plot_icustom_converter.html. Do you think this approach might work too?

xadupre commented 10 months ago

Whatever the option, there will be two versions, one for scikit learn, one for onnx. The numpy api is a way to have the same code produce the two versions. Otherwise there are two. Switch from FunctionTransformer to a custom transformer is easier given the design of skl2onnx. The converter for the custom transformer may be written with any api. With skl2onnx api or another one + add_onnx_graph. I assume you cannot share your custom function but maybe you can share a more simple one with a very simple pipeline and i can write a short example on how to do it. You would then change it for the real function.

paranjapeved15 commented 10 months ago
def calculate(df):
    df['c'] = 100*(df['a'] - df['b'])/df['b']
    return df

mapper = ColumnTransformer(
    transformers=[
        (
            "c",
            FunctionTransformer(calculate),
            ['a', 'b']
        ),
    ],
    remainder='passthrough',
    verbose_feature_names_out=False
)
mapper.set_output(transform="pandas")

pipe = Pipeline([("mapper", mapper), ("classifier", XGBoostClassifier())])

Thanks so much for the help @xadupre !

paranjapeved15 commented 10 months ago
```
class OverpriceCalculator(TransformerMixin):

def __init__(self):
    pass

def calculate_overprice(self,x,y):
    return 100 * (x-y)/y

def fit(self,X,y=None):
    return self

def transform(self,X,y=None):
    X['c'] = X.apply(lambda x: self.calculate(x.a, x.b), axis=1)
    return X


 Here is the custom transformer I wrote in case you need it to write the onnx converter.
xadupre commented 10 months ago

Thanks, I'll write the example today.

xadupre commented 10 months ago

I created a PR with an example similar to yours. Feel free to add comment wherever it needs more explanations from me.

paranjapeved15 commented 9 months ago

@xadupre I am trying to now write more custom transformers like above, I need to refer to the various onnx operators for writing converter functions. Is there a documentation page or wiki to explain the various onnx operators available?

xadupre commented 9 months ago

You can look into https://onnx.ai/onnx/operators/ or https://github.com/onnx/onnx/blob/main/docs/Operators.md. The first page contains some explaination about onnx, opset, domain, operators, ...

paranjapeved15 commented 9 months ago

@xadupre I am a bit confused about how to import these. In the PR that you created above (#1042) you imported operators like OnnxSlice, etc from skl2onnx.algebra.onnx_ops. But when I go to that location I don't see the source code for the function definitions of these operators. Also, why are the operators on the above page named as something like "Slice" but we import it as OnnxSlice?

xadupre commented 9 months ago

They are dynamically created by the package based on the operator schema. They have the same signature as the operators. Operator Slice becomes class OnnxSlice. When I created this API, onnx was growing at every release and I did not want to update the code at every release. These classes would have the same signature as the methods described at https://github.com/microsoft/onnxscript/blob/main/onnxscript/onnx_opset/_impl/opset18.py (without self).

paranjapeved15 commented 9 months ago

So all onnx operator imports would look like skl2onnx.algebra.onnx_ops.*?

xadupre commented 9 months ago

Yes.

addisonklinke commented 7 months ago

@xadupre thanks for all your documentation on registering custom converters!

Regarding the available operators to use in the operator converter, I saw this list in the ONNX docs. However, it appears to have different namespaces like ai.onnx and ai.onnx.ml (each with their own opset versions). When I inspect an example ONNX pipeline in Netron, I see both namespaces are imported

image

convert_sklearn(target_opset=...) would allow me to alter ai.onnx, but what if I wanted a particular opset from ai.onnx.ml?

EDIT: based on this example I see that target_opset can be a dict where keys represent the different namespaces

model_onnx = convert_sklearn(
    pipe,
    "pipeline_xgboost",
    [("input", FloatTensorType([None, 2]))],
    target_opset={"": 12, "ai.onnx.ml": 2},
)