onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
557 stars 104 forks source link

Version 1.9 Normalizer inputs dimension #679

Closed busFred closed 3 years ago

busFred commented 3 years ago

As I suggested in #678, the the converter for sklearn.preprocessing.Normalizer does not support double type. So I switch to use skl2onnx.algebra.onnx_ops.Normalizer. I guess that skl2onnx.algebra.onnx_ops.Normalizer is a wrapper skl2onnx.algebra.onnx_ops.Normalizer around ai.onnx.ml.Normalizer However, on the documentation for ai.onnx.ml.Normalizer, it is clearly suggested that the input X can be a tensor of shape [N,C] or [C]. But when I apply the following code in the converter function:

input: Variable = operator.inputs[0]
np_dtype = guess_numpy_type(input.type)
# normalize input
normalize_op: Union[OnnxOperator, Variable] = input
normalize_op = OnnxNormalizer(input, norm="L2", op_version=op_version)

And then run the transformer with the following:

try:
    onx = to_onnx(skm, X.astype(np.float64))
except Exception as e:
    traceback.print_exception(type(e), e, e.__traceback__)

It spits out the following error:

Traceback (most recent call last):
  File "/home/fred/Documents/research/sklearn_plugins/test/sklearn_plugins/cluster/test_spherical_kmeans_export.py", line 64, in <module>
    onx = to_onnx(skm, X.astype(np.float64))
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/convert.py", line 207, in to_onnx
    return convert_sklearn(model, initial_types=initial_types,
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/convert.py", line 157, in convert_sklearn
    onnx_model = convert_topology(topology, name, doc_string, target_opset,
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/common/_topology.py", line 1180, in convert_topology
    conv(scope, operator, container)
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/common/_registration.py", line 26, in __call__
    return self._fct(*args)
  File "/home/fred/Documents/research/sklearn_plugins/src/sklearn_plugins/cluster/_onnx_transform.py", line 102, in spherical_kmeans_converter
    proj_op.add_to(scope=scope, container=container)
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/algebra/onnx_operator.py", line 548, in add_to
    self.state.run()
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/algebra/graph_state.py", line 414, in run
    v = self._get_var_name(i, False, index=None)
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/algebra/graph_state.py", line 124, in _get_var_name
    var.add_to(self.scope, self.container, operator=operator,
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/algebra/onnx_operator.py", line 950, in add_to
    self.state.run()
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/algebra/graph_state.py", line 458, in run
    shape_calc(sub_op)
  File "/home/fred/anaconda3/envs/sklearn_plugins/lib/python3.8/site-packages/skl2onnx-1.9.0-py3.8.egg/skl2onnx/shape_calculators/svd.py", line 26, in calculate_sklearn_truncated_svd_output_shapes
    raise RuntimeError('Only 2-D tensor(s) can be input(s).')
RuntimeError: Only 2-D tensor(s) can be input(s).

To investigate the error, I print out the sample dummy input X I send to the to_onnx function:

print(X.shape)

and it prints out

(600, 2)
xadupre commented 3 years ago

Let me know if PR #680 addresses your issue. It fixes normalization with norm L2 and double. If the shape is unexpected, it is possible to reshape before and after. onnx allows reshape with -1. Reshape(M, (-1, 4)) reshapes M with 4 columns and the corresponding number of rows.

busFred commented 3 years ago

I can now normalize input with OnnxSubEstimator(Normalizer(norm="l2"), normalize_op, op_versionrow_norms=op_version). But using native ai.onnx.ml.Normalizer OnnxNormalizer(input, norm="L2", op_version=op_version) still generates the shape error. I think PR #680 solve my problem for now, but eventually I still want to migrate to native ai.onnx.ml.Normalizer

xadupre commented 3 years ago

Native Normalizer does not support double and is very strict about input shapes: Normalizer. That's why the converter switches to matrix operations.