microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.7k stars 2.93k forks source link

Slice behavior wrong with negative step and end = INT_MAX #11107

Open garymm opened 2 years ago

garymm commented 2 years ago

Describe the bug ORT's Slice tests assert that Slice(x, starts = [-1], ends = [INT_MAX], steps = [-1]) == Reverse(x).

ONNX spec says:

For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX when slicing forward and INT_MIN when slicing backward.

I haven't yet tested to see if the behavior is correct with INT_MIN, but that should be checked as well.

Urgency None.

To Reproduce Build ORT and run onnxruntime_test_all --gtest_filter='SliceTest.Slice1D_ReverseAllAxes_1'. Currently the test triggers a shape inference failure which results in a warning being logged.

Expected behavior Slice(x, starts = [-1], ends = [INT_MAX], steps = [-1]) == <empty tensor> Slice(x, starts = [-1], ends = [INT_MIN], steps = [-1]) == Reverse(x)

Both cases should be tested in ORT and not result in shape inference failures.

Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

hariharans29 commented 2 years ago

1) Slice(x, starts = [-1], ends = [INT_MIN], steps = [-1]) == x

When you say == x above in the expected behavior, I think you mean Reverse(x) because essentially that is what that Slice operation is performing - reversing x.

2) When I did the Slice-10 implementation, the spec had this ambiguous comment here: "For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX". The (obvious but unstated) assumption is that this is for slicing forward and by extension, INT_MIN should be used while slicing in reverse. But since this wasn't explicitly stated in the spec, the implementation was written such that INT_MAX was a "special" end value (like Numpy's None) to indicate the "end" of a dimension (while slicing in any direction). I think I asked for clarification on this when the op was versioned for opset-11 (I will dig up the old ONNX PR) and that is why we see this line in the spec now- "For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX when slicing forward and 'INT_MIN' when slicing backward."

I guess we could clean up that special handling of INT_MAX while slicing backwards and make life easier or we could make this ugly special casing limited to opset-10 since opset-11 and onwards no longer has this ambiguity

3) I haven't yet tested to see if the behavior is correct with INT_MIN, but that should be checked as well.

This should work just fine. It is the very next test in the list - https://github.com/microsoft/onnxruntime/blob/39d1b9e1c176c29d9d6814a8625e675ab39f73dc/onnxruntime/test/providers/cpu/tensor/slice_op.test.cc#L490

hariharans29 commented 2 years ago

Special-casing with associated comment: https://github.com/microsoft/onnxruntime/blob/e23a22451880a1ba1f46b2af6a71bb4a1079108b/onnxruntime/core/providers/cpu/tensor/slice_helper.h#L139

garymm commented 2 years ago

When you say == x above in the expected behavior, I think you mean Reverse(x)

Yes! Updated the issue desc.

I agree with your interpretation of the Slice-10 spec. If we're agreed then I guess we should also:

hariharans29 commented 2 years ago

When you say, "change the ONNX shape inference code to match", do you mean change the ONNX shape inference for opset-10 Slice alone to treat INT_MAX as a special value (whatever be the direction of slicing) just like the runtime ?

garymm commented 2 years ago

Yes.

hariharans29 commented 2 years ago

Sure. I guess that works.

hariharans29 commented 2 years ago

@garymm : I will update Slice's behavior so that the "special-casing" of INT_MAX is only done for Slice-V10.

I read the latest Slice's spec and I have a couple of things to call out (Please correct me if I am wrong):

https://github.com/onnx/onnx/issues/4115 https://github.com/onnx/onnx/issues/4116

CC: @jcwchen

garymm commented 2 years ago

@hariharans29 discussed this with @gramalingam and we decided that the special casing in ORT is wrong even for Slice-10. The spec doesn't really say that INT_MAX should be treated as a special flag. The behavior going all the way to the end of an axis is meant to be a consequence of setting an index that's larger than all the valid indices. Special handling of INT_MAX could be problematic if the end is int32 and the axis has more than INT32_MAX elements. So we think the behavior should be consistent for all op sets, and that the current behavior is wrong for negative step. Can you please update the implementation in accordance with this interpretation?

hariharans29 commented 2 years ago

Sure.

I don't feel very strongly about this either way. It was just one of those spot decisions taken 3 years ago while implementing an ambiguously worded spec. :) I will fix the special casing while reverse slicing.

skottmckay commented 2 years ago

The spec doesn't really say that INT_MAX should be treated as a special flag.

I think "_For slicing to the end of a dimension with unknown size, it is recommended to pass in INTMAX" in the opset 10 Slice spec is open to interpretation, as that sounds like a special flag to me given it doesn't qualify the statement with any mention of the direction. Opset 11 resolved that ambiguity.

Whilst making the behavior consistent for all op sets is desirable, doesn't that ignore potentially breaking an existing model that relies on the current opset 10 behavior? Products using ORT that have backwards compatibility guarantees may disagree with a change to that.

garymm commented 2 years ago

Both tensorflow-onnx and torch.onnx seem to have identical behavior for opset versions 10 and 11, so it seems pretty unlikely that anyone is relying on this difference to me. And even if someone is relying on it, IIUC they could also be relying on it for opset version 11 because the current ORT code doesn't differentiate. So if you want to maintain backwards compatibility, you're going to have to explicitly go against the ONNX spec.

hariharans29 commented 2 years ago

@skottmckay "Products using ORT that have backwards compatibility guarantees may disagree with a change to that." - I believe you are referring to WinML ? @pranavsharma : Should we check with WinML to see if this change in behavior to Slice-10 (which was based on my interpretation of the ambiguous V10 spec of Slice) will break anyone ?

If it is agreed that this change shouldn't affect anyone, #11132 is ready for a review.

pranavsharma commented 2 years ago

WinML is particularly sensitive to breaking changes esp. given that a version of ORT is in the OS now. I don't think anyone can authoritatively say if a customer will be impacted by this. I would err on the side of caution though and be conservative.

garymm commented 2 years ago

Fully maintaining backwards-compatibility means clearly diverging from the ONNX spec for opset version 11, which seems bad to me. And if you're not going to break BC for version 11, it seems better to me to just make it consistent. Just my two cents, you're the owners of this, so do as you think is best.