onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
545 stars 99 forks source link

Error converting KNeighborsClassifier with numeric features #482

Closed marinagre-px closed 4 years ago

marinagre-px commented 4 years ago

When converting KNN Classifier with numeric features, i.e.

    numeric_features = ['path_len']
    numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())])

AND

 classifier = KNeighborsClassifier(n_neighbors=5, weights='distance')

the following error is thrown:

2020-06-09 14:12:16.709549 [E:onnxruntime:, sequential_executor.cc:281 Execute] Non-zero status code returned while running Scan node. Name:'Sc_Scan' Status Message: Subgraph must have the shape set for all outputs but next_out did not.
Traceback (most recent call last):
  File "/Users/marinagrechuhin/Development/Development-Tests/test_onnx/plot_complex_pipeline.py", line 349, in <module>
    pred_onx = sess.run(None, inputs)
  File "/Users/marinagrechuhin/Development/Development-Tests/test_onnx/venv/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 111, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Scan node. Name:'Sc_Scan' Status Message: Subgraph must have the shape set for all outputs but next_out did not.

If using LinearRegression classifier, it works

xadupre commented 4 years ago

That's a bug. In the mean time, can you try with the following option?

convert_sklearn(..., options={KNeighborsClassifier: {'optim': 'cdist'}})

It replaces the part which fails by an operator cdist (faster but this operator is the official onnx specification).

xadupre commented 4 years ago

I think I'll need the version of every package as I can't replicate the issue. Here is my code:

    def test_model_knn_iris_classifier_multi_reg_pipeline(self):
        iris = datasets.load_iris()
        X = iris.data.astype(numpy.float32)
        y = iris.target.astype(numpy.float32)
        #y = numpy.vstack([y, 1 - y, y + 10]).T
        knn = KNeighborsClassifier(
            n_neighbors=5, weights='distance')
        model = Pipeline(steps=[
            ('imputer', SimpleImputer(strategy='median')),
            ('scaler', StandardScaler()),
            ('knn', knn)])
        model.fit(X[:13], y[:13])
        onx = to_onnx(model, X[:1],
                      # options={id(model): {'optim': 'cdist'}},
                      target_opset=TARGET_OPSET)
        dump_data_and_model(
            X.astype(numpy.float32)[:7],
            model, onx, methods=["predict", "predict_proba"],
            basename="SklearnKNeighborsClassifierMImp")
marinagre-px commented 4 years ago

Thank you for quick response! These are the versions: numpy: 1.18.4 scikit-learn: 0.23.1 onnx: 1.7.0 onnxruntime: 1.3.0 skl2onnx: 1.6.1 python: 3.8

xadupre commented 4 years ago

I suggest trying with the version of skl2onnx from master branch or this one released on github. There were so many changes since, I tried with this one and it seems fixed.

marinagre-px commented 4 years ago

That's a bug. In the mean time, can you try with the following option?

convert_sklearn(..., options={KNeighborsClassifier: {'optim': 'cdist'}})

It replaces the part which fails by an operator cdist (faster but this operator is the official onnx specification).

Thank you, this worked

marinagre-px commented 4 years ago

Also using version 1.7.0 without the options worked. Thanks