Closed alexander-camuto closed 1 year ago
Mmm... Is it really ? I'm obviously strugging a bit making sense of the ONNX spec.
https://github.com/onnx/onnx/blob/main/docs/Operators.md#gather
So in this case, input is or rank r=3. Indices is of rank q=0, so output should be of rank 2, right ?
I see what you mean.
I've seen rank
be used pretty loosely and passing singleton (non-matrix) indices like this is not even defined behaviour in torch or tf:
>>> import torch
>>> t = torch.tensor([[1, 2], [3, 4]])
>>> torch.gather(t, 1, torch.tensor([[0, 0], [1, 0]]))
tensor([[1, 1],
[4, 3]])
>>> torch.gather(t, 1, torch.tensor([[0, 0], [1, 0]])).shape
torch.Size([2, 2])
>>> torch.gather(t, 1, torch.tensor([0])).shape
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Index tensor must have the same number of dimensions as input tensor
I actually think your interpretation here is correct as the subsequent shapes / axes implied by the onnx graph don't make sense without the deleted axis.
This is the node in question:
Which is then sliced along axis 1:
This wouldn't really make sense for an input of size [1,1,3]
but does for an input of size [1,3]
. The latter happens if we interpret the singleton index as being of rank 0 -- and delete the gathered axis accordingly.
Thank you for clarifying !
Yeah, there is a lot of abuse in the way trivial dimensions are sometimes discarded or added all over the place. I think the current implementation matches the spec.
But if we have an important source of such abuse somewhere in the ecosystem (like torch+torch-to-nnef), we may need to go against the spec and implement workarounds (if possible at all). Do you know what generated these models that look invalid ?
Yeah the sk2torch python package applied to sklearn decision trees. So it may be particular to that package
I'm strangely not surprised, sklearn decision trees are a repeating offender in axes abuse. They have pre-everything-is-a-tensor era behaviours that are often pretty inconsistent and need custom code to figure out. Have you looked into the problem enough to suggest workarounds strategies that could accommodate what they are generating without going off-spec ?
I think I'll recommend the hummingbird-ml
package for the conversion -- which seems to produce less/no weird axes ops.
Thank you for all your help
Apologies for another
Gather
issue but it appears that #1191, though it fixes #1190 and #1187 now deletes singleton dimensions upon analysis for other edge cases of models that our users have just sent over.as an example consider the following trace generated by running
tract network.onnx dump --io-long
:If we focus in on node 22:
We find that the gather op, which reduces over axis 1 of the input of size
batch_size,1,3
generates an output of sizebatch_size,3
-- which is not expected behaviour.I've attached one of the edge case models for which this happens:
network.onnx.zip