onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
525 stars 98 forks source link

Misalignment between sklearn and onnx definition of Normalizer with Max norm #793

Open ROC5COR opened 2 years ago

ROC5COR commented 2 years ago

Hi, Using sklearn Normalizer(norm=Max) and the onnx version converted via skl2onnx also called Normalizer I get differences when running this layer with sklearn or onnxruntime. For the same Normalizer layer used with same inputs parameters I get negative outputs with sklearn and positive outputs with onnxruntime !

Digging into sklearn code I found that the follownig formula was used by sklearn: sklearn_norm_code

And digging into onnxruntime code the following code is used: onnxruntime_norm_code

The main difference here is that sklearn is using an Absolute value when computing the max value, while onnxruntime is not. onnxruntime is aligned with its documentation: https://github.com/onnx/onnx/blob/master/docs/Operators-ml.md#ai.onnx.ml.Normalizer

Issue is: what should be the correct implementation of the Normalizer ? Should it use Absolute values or not ?

Thanks

xadupre commented 2 years ago

The converter should produce an ONNX graph equivalent to the sklearn model, whether it uses an operator Normalizer or not. A normalizer from scikit-learn is not always converted into a normalizer from ONNX. Then having a different definition for a Normalizer is not an issue. The only objective is to have an ONNX model with no discrepencies.

ROC5COR commented 2 years ago

Hello, thanks for your reply, I agree with you that we should be aligned between the sklearn model and the produced onnx graph. But in my simple example sklearn model (a Pipeline with only a Normalizer(norm=max)), I do not have the same results when running the sklearn model or the onnx one.

What I see it that the onnx graph produced is not the equivalent of the sklearn model. For the same input I am getting 1 as output of my sklearn model and -1 at the end of my onnx model