microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.32k stars 274 forks source link

onnxruntime.capi.onnxruntime_pybind11_state.Fail ONNXRuntimeError #770

Open ksaur opened 1 month ago

ksaur commented 1 month ago

Onnxruntime 1.18.0 was released 3 hours ago on pypi, but I don't see it yet on their github.

From CI:

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/test_lightgbm_converter.py::TestLGBMConverter::test_lightgbm_onnx - onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {
FAILED tests/test_sklearn_gbdt_converter.py::TestSklearnGradientBoostingConverter::test_varying_batch_sizes - onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (/_operators.0/Transpose) Op (Transpose) [TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {
===== 2 failed, 596 passed, 66 skipped, 412 warnings in 201.43s (0:03:21) ======

Both are related to Op (Transpose)

ksaur commented 1 month ago

Rearranging our test a bit for debugging:

    def test_varying_batch_sizes(self):
        model = GradientBoostingClassifier(n_estimators=10, max_depth=3)
        np.random.seed(0)
        X = np.random.rand(100, 200)
        X = np.array(X, dtype=np.float32)
        y = np.random.randint(2, size=100)

        X_test = np.random.rand(2, 200)
        X_test = np.array(X_test, dtype=np.float32)

        model.fit(X, y)

        model_probs = model.predict_proba(X_test) 

        # failure is here on the HB convert line
        # conv_model  = hummingbird.ml.convert(model, "onnx", X, extra_config={})  

The ORT team mentioned that there was a change to the transpose opset and that "the input shape seems missing".

When I run this code above, at the torch.onnx.export( call in _topology.py, I get some warnings:

torch/onnx/utils.py:1702: UserWarning: The exported ONNX model failed ONNX shape inference. The model will not be executable by the ONNX Runtime. If this is unintended and you believe there is a bug, please report an issue at https://github.com/pytorch/pytorch/issues. 

Error reported by strict ONNX shape inference: [ShapeInferenceError] Inference error(s): (op_type:Reshape, node name: /_operators.0/Reshape_4): 
[ShapeInferenceError] Dimension could not be inferred: incompatible shapes
(op_type:ReduceSum, node name: /_operators.0/ReduceSum): [TypeInferenceError] Input 0 expected to have type but instead is null
(op_type:Add, node name: /_operators.0/Add): [TypeInferenceError] Input 0 expected to have type but instead is null
(op_type:Sigmoid, node name: /_operators.0/Sigmoid): [TypeInferenceError] Input 0 expected to have type but instead is null
 (Triggered internally at ../torch/csrc/jit/serialization/export.cpp:1488.)
  _C._check_onnx_proto(proto)

Maybe it does not like our dynamic_axes?

Which I guess explains the error message of ({) which mean Null with the "[TypeInferenceError] Invalid attribute perm {1, 0}, input shape = {"

ksaur commented 1 month ago

dynamic_axes_cfg in this test is {'input_0': {0: 'sym'}, 'variable': {0: 'sym'}}.

If I remove the dynamic axes here (ex: ###dynamic_axes=dynamic_axes_cfg, from torch.onnx.export), and change my test code above to X_test = np.random.rand(100, 200) (instead of (2, 200) which won't work without dyn axes), it passes just fine.