python / typing

Python static typing home. Hosts the documentation and a user help forum.
https://typing.readthedocs.io/
Other
1.57k stars 229 forks source link

Typing for multi-dimensional arrays #513

Open shoyer opened 6 years ago

shoyer commented 6 years ago

I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (https://github.com/numpy/numpy/issues/7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).

To help guide discussion, I wrote a document outlining ideas for array shape typing.

To summarize:

This will likely require some new typing features (as well as type-checker support). Notably:

ilevkivskyi commented 6 years ago

It looks like the proposal of integer generics is also relevant here https://github.com/python/mypy/issues/3345 (it looks almost identical to what you call DimensionVar).

In general, I am very supportive of this project (I have heard many times that static typing would be very helpful for data science, numerics and related fields, but current support in mypy and PEP 484 is very limited). The main obstacle however is the size of this project (it may require its own PEP). I will read your document (thanks for writing it), but already now it seems to me that it may make sense to start from features that will be useful in general (i.e. also outside of numeric stack) such as literal types and variadic generics.

Also tagging @JukkaL here just in case.

shoyer commented 6 years ago

The main obstacle however is the size of this project (it may require its own PEP).

Yes, I expect a PEP will be necessary, especially if we want to standardize base types for typing multi-dimensional arrays in the typing module.

it seems to me that it may make sense to start from features that will be useful in general (i.e. also outside of numeric stack) such as literal types and variadic generics.

Indeed, this is probably the best place where the broader typing community can help.

shoyer commented 6 years ago

I've opened a sub-issue for discussing syntax for array typing: https://github.com/python/typing/issues/516

ilevkivskyi commented 5 years ago

Some update on the issue:

Our (mypy core team) previous schedule for working on this was Q4 2018. However, we decided that some type system features (such as literal types and variadic generics) needed to efficiently support NumPy will be also useful in general, so we decided to implement the general support for such features first. Literal types are almost already there, and variadic generics are going to be added in coming months. After that we will start working on dedicated NumPy support (around Q2), sorry for a delay.

ilevkivskyi commented 5 years ago

Sorry, I forgot to post notes from the latest Python typing meetup on numeric stack typing here. Here they are

vsiles commented 5 years ago

Are you specifically looking at numpy, or at the machine learning echosystem with numpy/pytorch/... ? I found today that they are quite heterogeneous:

>>> x = torch.zeros([4], dtype=torch.int8)
>>> y = torch.zeros([4], dtype=torch.float32)
>>> torch.add(x, y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected type torch.FloatTensor but got torch.CharTensor
>>> xx = numpy.array([4], dtype=numpy.int8)
>>> yy = numpy.array([4], dtype=numpy.float32)
>>> xx + yy
array([8.], dtype=float32)

Pytorch doesn't seems to do auto cast when types are different whereas Numpy is doing some upcast (see https://stackoverflow.com/questions/56022497/numpy-pytorch-dtype-conversion-compatibility/56022918?noredirect=1#comment98689941_56022918)

ilevkivskyi commented 5 years ago

Are you specifically looking at numpy, or at the machine learning echosystem with numpy/pytorch/... ?

At all of them. Dimensionality/shape will be an additional abstraction orthogonal to container type and element type.

vsiles commented 5 years ago

Sorry I wasn't clear, I wanted to ask for the numerical stack part specifically. Do we have a current target in numpy / pytorch / tensorflow that would focus most of the effort are are people looking to their favorite flavor (which seems incompatible with each other)

ilevkivskyi commented 5 years ago

Do we have a current target in numpy / pytorch / tensorflow that would focus most of the effort are are people looking to their favorite flavor (which seems incompatible with each other)

There are two separate big things required to support numerical libraries:

In the first one we ideally want to be as broad as possible, I think there are no particular "preferences". While in the second, I think we should probably start with numpy, since it is the common dernominator for many other libraries.

dmontagu commented 4 years ago

@ilevkivskyi do you have any suggestions for how to track progress on (or, even better, contribute to) the development of these "numeric stack typing" features? Full support for the features described in your linked notes on numeric stack typing would be incredibly useful!

ilevkivskyi commented 4 years ago

@dmontagu The best way is to just follow this issue, also you can subscribe to typing-sig@python.org mailing list. There are no updates here because we didn't make much progress yet. Whether you can help depends on your background and how much time are you ready to spend on this. This is not a simple feature and it is hard to split in small "things".

theodoretliu commented 4 years ago

Hey! I'm a student working on a thesis and I am very interested in contributing to this project as part of my research! Mainly, I want to statically check dimensionality alignment in numpy operations. Let me know how I can help out.

ilevkivskyi commented 4 years ago

@theodoretliu Hi! It is great to hear you are interested. Just to get a bit more info, how much time will you be able to spend on this?

The best course of action is probably to implement support for relevant type system features in one of the mainstream Python type checkers. I would of course propose mypy :-) as one of its maintainers, see https://github.com/python/mypy

If this sounds right to you, I can give you a more detailed plan and some guidance.

theodoretliu commented 4 years ago

I'd be willing to dedicate pretty significant time in the coming months. And yes, that sounds like a great course of action!

vsiles commented 4 years ago

Be sure to talk to Mark Mandoza to have input from our experience doing so in Pyre :D

Le mer. 13 nov. 2019 à 17:12, Theodore Liu notifications@github.com a écrit :

I'd be willing to dedicate pretty significant time in the coming months. And yes, that sounds like a great course of action!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/python/typing/issues/513?email_source=notifications&email_token=ABWLNQHD7HHN6QRELANYVSLQTQRP3A5CNFSM4EHJIID2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6VK2A#issuecomment-553473384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWLNQFYX3G7RQZDABAIVBLQTQRP3ANCNFSM4EHJIIDQ .

vsiles commented 4 years ago

Mark Mendoza* ... my finger are a bit dumb today, sorry.

Le mer. 13 nov. 2019 à 17:14, Vincent Siles vincent.siles+pyty@gmail.com a écrit :

Be sure to talk to Mark Mandoza to have input from our experience doing so in Pyre :D

Le mer. 13 nov. 2019 à 17:12, Theodore Liu notifications@github.com a écrit :

I'd be willing to dedicate pretty significant time in the coming months. And yes, that sounds like a great course of action!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/python/typing/issues/513?email_source=notifications&email_token=ABWLNQHD7HHN6QRELANYVSLQTQRP3A5CNFSM4EHJIID2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6VK2A#issuecomment-553473384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWLNQFYX3G7RQZDABAIVBLQTQRP3ANCNFSM4EHJIIDQ .

mrahtz commented 4 years ago

A group of us at DeepMind are interested on working on this too. We've set up a mailing list at https://groups.google.com/g/python-shape-checkers to try and bring together all the conversations about this into one place. I've posted a summary there of what seems to be the current state of things, but stay tuned for updates!

fylux commented 4 years ago

Hi @mrahtz,

Thanks for the initiative! Indeed there are currently a lot of ongoing efforts in this directions. At Facebook we are currently working directly on this, and already support several use cases with Pyre, with support for variadic syntax, which has been polished with respect to the initial proposal at Python Typing Summit. However, it would be very beneficial to get first hand information of the state of each team that is working on this, since so far I have read about people working on that in Dropbox, Facebook, Google and now Deepmind.

Also, please don't miss the Python Typing mailing list.

redradist commented 4 years ago

I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (numpy/numpy#7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).

To help guide discussion, I wrote a document outlining ideas for array shape typing.

To summarize:

* We would like to be able to type-check both data types (e.g., `float64`) and shapes (e.g., a 3x4 array) for multi-dimensional arrays.

* There are many uses cases where support for checks using dimension identity would be valuable, e.g., to indicate that a function transforms an array with shape `(N, M)` to shape `(N,)` for arbitrary integers `N` and `M`. These dimension variables look very similar to `TypeVar`, if `TypeVar` supported integers as types.

* A notion of "zero or more additional dimensions" would also be quite valuable, and is a core part of the type for many NumPy operations (generalized ufuncs). This might be naturally written with Ellipsis, e.g., `(...., N)` for an array with a last dimension of length `N` and any number of proceeding dimensions. There are particular rules (broadcasting) that should be enforced for matching multiple arguments with variable numbers of dimensions.

This will likely require some new typing features (as well as type-checker support). Notably:

* Support for literal values (#478), so we can type check operations like `array.sum(axis=0)`.

* Variadic generics (#193), we can write types like `NDArray[N]` and `NDArray[N, M]`.

* Some sort of support for dimension identity in shapes (e.g., integer types, or `DimensionVar` as described in my doc).

* Standard syntax for writing array dtype/shape annotations: what should these look like?

You wanted this annotation:

class float64: # Custom annotation class
    def __getitem__(self, item):
        # Some value should be set to identify that float64[:], float64[:,:] or etc.
        return self

float64 = float64()

def for_loop(n: float64[:,:]):
    pass

Take it ;)

James4Ever0 commented 1 year ago

To solve this issue, using "Annotated[]" would be efficient to declare the type already. However to get the proper type and "static" type checking on "Annotated[]" we need support on mypy/pyanalyze etc. To annotate and infer type with arithmetic from function calls like "np.reshape" we need to use code to define custom rules (not just PEP484) to analyze proper types. I doubt there are few supports on custom "Annotated[]" types, not easy for user to define and statically check their own "Annotated[]" types, which probably is the solution to all kinds of dynamic types in python, enabling symbolic execution of arbitrary python code.