ramonhagenaars / nptyping

💡 Type hints for Numpy and Pandas
MIT License
576 stars 29 forks source link

Wildcard ellipsis ... matching incorrect? #97

Closed oliver-batchelor closed 1 year ago

oliver-batchelor commented 1 year ago

As expected

>>> isinstance(random.randn(3, 2, 55), NDArray[Shape["3, *, ..."], Any])
True  

These two I would expect the ellipsis to match the trailing dimensions - but they don't.

>>> isinstance(random.randn(3, 2, 55), NDArray[Shape["3, 2, ..."], Any])
False  

>>> isinstance(random.randn(3, 2, 55), NDArray[Shape["3,  ..."], Any])
False

Then finally, the ellipsis must only exist at the end.

>>> isinstance(random.randn(3, 2, 55), NDArray[Shape[" ..., 55"], Any])
nptyping.error.InvalidShapeError: '..., 55' is not a valid shape expression.

Am I just failing to understand how the ellipsis is used here? Which as far as I can tell is the usual usage in terms of array indexing, where it can match zero or more dimensions.. for example, these are all valid numpy indexing.

x = random.randn(2, 3, 55)
>>> x[1,...].shape
(3, 55)
>>> 
>>> x[1,...].shape
(3, 55)
>>> x[1, 1, ...].shape
(55,)
>>> x[1, 1, 1, ...].shape
()
>>> x[..., 1, 1].shape
(2,)
>>> x[1, ...,  1].shape
(3,)
ramonhagenaars commented 1 year ago

The ellipsis in an nptyping shape expression means as much as "and so forth". E.g. Shape["3, 2, ..."] describes an array with one dimension of size 3 and one or more dimensions of size 2.

The wildcard (*) can be used to express a dimension of any size. So the shape of random.randn(3, 2, 55) would match against Shape["3, 2, *"]. Combining the ellipsis with the wildcard allows you to annotate any dimension of any size.

For more examples, see the documentation.

It is indeed true that the ellipsis is only allowed at the end of a shape expression. The reason for this, is that it would take significantly more effort to implement while not adding much value. What use case would benefit from an expression like Shape[3, ..., 2]? And what about Shape[3, ..., 2, ..., 1]?

ramonhagenaars commented 1 year ago

No more recent activity: closing.

ianpegg-bc commented 6 months ago

I would like to put in a vote for re-opening this. My use-case is for arrays with any number of dimensions followed by 1-2 final dimensions. For a complicated example, you could have an array like:

[batch_size, image_height, image_width, 3x4 projection matrix]

I have functions that only care about the last two dimensions, so, in the functions, I would like to type this like:

NDArray[Shape["..., 3, 4"]

so it is compatible with any array where array.shape[-2:] == (3, 4). (I think that *, ... is not suitable because it implies one or more dimensions, not zero or more.)