microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.32k stars 274 forks source link

Varying batch sizes? #653

Closed stillmatic closed 1 year ago

stillmatic commented 1 year ago

Hi, I have some XGB/LGBM models as SKL models, and have been converting them into ONNX via Hummingbird successfully. One issue is that when converting ONNX, Hummingbird expects a test input. This test input has a batch size associated with it, but I want to pass in arbitrary batch sizes to the ONNX runtime later. Instead, the ONNX runtime insists on having the exact batch size as when Hummingbird traces it.

Is there a good way to automatically reshape the ONNX model to accept varying/arbitrary batch sizes?

This hack appears to work, but I have not thought too carefully about it: https://github.com/stillmatic/hummingbird/commit/573c8b7dac2c8e7009727df8253b08285e06a455

ksaur commented 1 year ago

Hi @stillmatic!

I thought we had fixed it so that the onnx model accepted symbolic dimensions (see here), but it's possible that onnx changed, or that your versions are different than ours. What versions of ONNX are you using?

For example, when I run LGBM-ONNXML-example.ipynb and at the end of it I do print(onnx_model.model), I can see that at the end it shows the symbolic:

output {
    name: "probabilities"
    type {
      tensor_type {
        elem_type: 1
        shape {
          dim {
            dim_param: "sym"
          }
          dim {
            dim_value: 2
          }
        }
      }
    }

And if I modify the size of X:

X = np.array(np.random.rand(20000, 28), dtype=np.float32)
onnx_model.predict(X)

it still works. Or do you mean something else?

stillmatic commented 1 year ago

I do see similar results, that the input and output nodes have symbolic batch results, but internal nodes (GatherElement) still expect a fixed batch size. Specifically, Non-zero status code returned while running GatherElements node. Name:'/_operators.0/GatherElements' Status Message: GatherElements op: 'indices' shape should have values within bounds of 'data' shape. Invalid value in indices shape is: 10000

Here is the LGBM example with a fixed test input size (20).

image

Passing without test input implies to me that Hummingbird is passing its own 10000 size input array.

image
requirements:
    "onnxruntime>=1.0.0",
    "onnxmltools>=1.6.0,<=1.11.0",
    "skl2onnx>=1.7.0,<=1.12.0",
installed:
    ort 1.13.1
    onnxmltools 1.11.1
    onnx 1.12.0
    skl2onnx 1.12

The only mismatch I see there is that onnxmltools is an additional point release above the requested 11.1.0 maximum, but I doubt that is the cause of the error.

stillmatic commented 1 year ago

Testing your 20_000 input on this model, which expects 10_000, I am able to run it. However, this creates some very odd behavior!

image

First, I think the model runs only because the new batch size is larger than the original input size. Trying with a smaller batch size, eg 100, will cause the error shown in the above comment.

Second, the result is of the original 10_000 batch size, instead of the expected 20_000 size.

ksaur commented 1 year ago

You're right, this appears to be a bug. I see that we are using sym while in all the ONNX tests they are using -1 so it's possible that this stopped working.

I'll add this to the buglist! I hope to have time to work on it within the next week or so, but in the meantime if you come up with something (in your starter PR) please do share!

ksaur commented 1 year ago

@stillmatic, can you open a PR with your https://github.com/stillmatic/hummingbird/commit/573c8b7dac2c8e7009727df8253b08285e06a455

stillmatic commented 1 year ago

Will do! I'll take a stab at updating the tests as part of that too.