numpy / numpy-stubs

Experimental typing stubs for NumPy
BSD 3-Clause "New" or "Revised" License
282 stars 32 forks source link

ENH: get more specific about _ArrayLike, make it public #66

Closed person142 closed 4 years ago

person142 commented 4 years ago

Closes https://github.com/numpy/numpy-stubs/issues/37.

Add tests to check various examples.

person142 commented 4 years ago

As something of a data point, MyPy passes on SciPy when checked against this branch. Now the types in SciPy are still very rough, so take that with a grain of salt, but perhaps it means something.

rgommers commented 4 years ago

What about other array-like objects? Are things with both __array__ and __len__ passing because of Sequence, or is that not tested? Or is that what the Protocol discussion was about?

person142 commented 4 years ago

@rgommers @BvB93 thanks for the feedback-I added a _SupportsArray protocol in https://github.com/numpy/numpy-stubs/pull/66/commits/4f654e538115c8c741dcd96b0b64d5ab8d8ef80e; it is quite a nice unification.

person142 commented 4 years ago

It seems that the principle we're roughly going for here is "don't allow stuff that will produce object arrays", though we still leave an escape hatch a la

https://github.com/numpy/numpy-stubs/commit/4f654e538115c8c741dcd96b0b64d5ab8d8ef80e#diff-98e9e1660b68614cffb8585ea52a0bdcR31.

BvB93 commented 4 years ago

@rgommers @BvB93 thanks for the feedback-I added a _SupportsArray protocol in 4f654e5; it is quite a nice unification.

Btw, doesn't __array__() also have the dtype argument?

person142 commented 4 years ago

Btw, doesn't array() also have the dtype argument?

Hm, it appears to be a little wonky:

>>> np.float64(1).__array__()
array(1.)
>>> np.float64(1).__array__(np.complex128)
array(1.+0.j)
>>> np.float64(1).__array__(dtype=np.complex128)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __array__() takes no keyword arguments
>>> class A:
...     def __array__(self, dtype):
...         return np.array([1, 2, 3], dtype=dtype)
...
>>> np.array(A())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __array__() missing 1 required positional argument: 'dtype'
>>> np.array(A(), dtype=np.float64)
array([1., 2., 3.])
>>> class B:
...     def __array__(self, dtype=None):
...         return np.array([1, 2, 3], dtype=dtype)
...
>>> np.array(B())
array([1, 2, 3])
>>> np.array(B(), dtype=np.float64)
array([1., 2., 3.])
>>> class C:
...     def __array__(self):
...         return np.array([1, 2, 3])
...
>>> np.array(C())
array([1, 2, 3])
>>> np.array(C(), dtype=np.complex128)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __array__() takes 1 positional argument but 2 were given
person142 commented 4 years ago

That's unfortunate... it looks like the __array__ methods on scalars are inherently different from the user ones since they force positional arguments, whereas only positional arguments in the user ones don't work.

person142 commented 4 years ago

Ok, I made the protocol

class _SupportsArray(Protocol):
    def __array__(
        self, dtype: Optional[_DTypeLike] = ...
    ) -> Union[Sequence, ndarray]: ...

This unfortunately means that ndarray and generic no longer satisfy it! To work around that I added them back to the big union. This is kind of gross, but I think it's the best we can do given the above examples. WDYT?

BvB93 commented 4 years ago

This unfortunately means that ndarray and generic no longer satisfy it! To work around that I added them back to the big union. This is kind of gross, but I think it's the best we can do given the above examples. WDYT?

I'd suggest adding another overload to the _SupportsArray protocol instead, such that it works with in situations where dtype is positional-only or keyword-or-positional:

# first overload: dtype is optional and positional-only
# Second overload: dtype is optional and can be a positional or keyword argument
class _SupportsArray(Protocol):
    @overload
    def __array__(self, __dtype: _DtypeLike = ...) -> ndarray: ...
    @overload
    def __array__(self, dtype: _DtypeLike = ...) -> ndarray: ...

class A():
    def __array__(self, dtype: _DtypeLike = None) -> ndarray:
        return np.array([1, 2, 3], dtype=dtype)

class B():
    def __array__(self, __dtype: _DtypeLike = None) -> ndarray:
        return np.array([1, 2, 3], dtype=__dtype)

a = A()
a.__array__()
a.__array__(float)
a.__array__(dtype=float)

b = B()
b.__array__()
b.__array__(float)
b.__array__(dtype=float)  # E: Unexpected keyword argument
b.__array__(__dtype=float)  # E: Unexpected keyword argument

By the way, why is the returned type annotated as Union[Sequence, ndarray] in your example? Doesn't it always just return ndarray?

person142 commented 4 years ago

I'd suggest adding another overload to the _SupportsArray protocol instead, such that it works with in situations where dtype is positional-only or keyword-or-positional

The signatures are incompatible, so that’s going to violate Liskov I think. (But I’ll give it a try.)

By the way, why is the returned type annotated as Union[Sequence, ndarray] in your example?

I was going by the docs here:

https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

which say ndarray or nested sequence.

BvB93 commented 4 years ago

The signatures are incompatible, so that’s going to violate Liskov I think. (But I’ll give it a try.)

It's not quite clear to me where this incompatibility is located. __array__() (ref) takes self and (optionally) dtype as argument. The newly overloaded _SupportsArray can now deal with cases where dtype is either keyword-only or positional-or-keyword.

I was going by the docs here:

I believe what they're talking about in the docs is an object which either: a. Has an __array__() method that returns an ndarray. b. Is a (nested) sequence.

So Union[_SupportsArray, Sequence[Any]] in other words, rather than __array__() itself (potentially) returning a Sequence.

person142 commented 4 years ago

It's not quite clear to me where this incompatible is located.

I believe what they're talking about in the docs is an object which either:

Yup, right on both counts! Updated; hopefully we're good to go now.

Note that I had to make _DtypeLike public too so that someone could use it in the annotations for their __array__ method, but that's ok because it was desired anyway (https://github.com/numpy/numpy-stubs/issues/13).

shoyer commented 4 years ago

This looks really nice!

My own serious concern here is about adding public type aliases. These do seem quite useful, but what would this imply if/when we move type annotations into NumPy properly? If np.ArrayLike needs to be valid, does that imply that if we move annotations into NumPy we need to expose ArrayLike in the public API of NumPy?

person142 commented 4 years ago

If np.ArrayLike needs to be valid, does that imply that if we move annotations into NumPy we need to expose ArrayLike in the public API of NumPy?

Thankfully it doesn't-this is the same sort of fudging that happens with things like Queue[T] in the stdlib (where the real class isn't actually generic). But it does mean that people using annotations need to take one of these options:

import numpy as np

x: "np.ArrayLike"  # Use strings
from __future__ import annotations

import numpy as np

x: np.ArrayLike  # Now it's treated as a string anyway
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from numpy import ArrayLike
else:
    ArrayLike = None  # Or whatever

x: ArrayLike
person142 commented 4 years ago

Although maybe I am interpreting your question in the wrong way. I think a backdrop to my answer above is an assumption that even when we move the types into NumPy, they will remain stubs instead of being inlined into the code.

I suspect that this is the right course of action because we've seen that the types require a fair bit of "fudging"; i.e. we aren't trying to represent the full NumPy API but instead some typeable subset of it. I think that if the types are inlined then we lose the ability to do that fudging as well. (And lose the ability to do a gazillion overloads; doubt that is going to fly in the NumPy codebase proper.)

I'd contrast this to e.g. SciPy where we are inlining the types, mostly because there isn't much odd there (except for needing a bunch of stubs for extension modules).

ethanhs commented 4 years ago

FWIW, I've heard that at the language summit they decided __future__.annotations will become default in 3.10 I believe.

shoyer commented 4 years ago

I appreciate that we could need np.ArrayLike in strings or type annotations, but I suspect that it could still lead to some user confusion to not define them at runtime, too, e.g., to cover use cases like type aliases. The user experience is independent of whether we choose to use stubs or annotations inside NumPy — though I expect we’ll probably end up with some of both.

NumPy-stubs is certainly still experimental, so you definitely have my blessing to go ahead for now. But I do think it could be worth sounding out the broader NumPy community on the appetite for adding a selective handful of type protocols into NumPy proper. We might get some useful feedback. For example, should the protocol be called ArrayLike or array_like as currently appears in many NumPy docstrings?

rgommers commented 4 years ago

If it helps making ArrayLike et al. public, I don't see a big issue with that; would probably prefer to write it as

from numpy.types import ArrayLike

x: ArrayLike

to keep it out of the main namespace. In my (limited) experience, it's helpful for things to exist at runtime.

person142 commented 4 years ago

from numpy.types import ArrayLike

Keeping it out of the main namespace seems good, though maybe it should be numpy.typing instead of numpy.types to match the stdlib module?

person142 commented 4 years ago

Re

But I do think it could be worth sounding out the broader NumPy community on the appetite for adding a selective handful of type protocols into NumPy proper.

I'll send out something to the mailing list.

rgommers commented 4 years ago

from numpy.types import ArrayLike

Keeping it out of the main namespace seems good, though maybe it should be numpy.typing instead of numpy.types to match the stdlib module?

No, I avoided that name on purpose, naming something the same as a stdlib module is usually a bad idea (e.g. scipy.io was a pretty bad choice, that's why it's usually import as spio).

BvB93 commented 4 years ago

What about something along the lines of numpy.annotations?

eric-wieser commented 4 years ago

Keeping it out of the main namespace seems good, though maybe it should be numpy.typing instead of numpy.types to match the stdlib module?

No, I avoided that name on purpose, naming something the same as a stdlib module is usually a bad idea

As a counterargument, numpy.distutils clashes with distutils, but we used it anyway, presumably because the similarity was worth emphasizing.

Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible, as:

from typing import Tuple
from numpy.typing import ArrayLike

def get_arr() -> Tuple[ArrayLike, int]: ...

The similarity aids reading here, and the clash is irrelevant.

Alternatively, some users might want the full names anyway. The clash is again irrelevant:

def get_arr() -> typing.Tuple[np.typing.ArrayLike, int]: ...

Finally, if the user cares enough to import just the submodule, they probably want to do something similar with typing anyway (note: I've not actually seen anyone do this).

import typing as t
import numpy.typing as npt
def get_arr() -> t.Tuple[npt.ArrayLike, int]: ...
person142 commented 4 years ago

Ok, discussion on the mailing list:

http://numpy-discussion.10968.n7.nabble.com/Feelings-about-type-aliases-in-NumPy-td48059.html

seems to be dying down. Takeaways so far:

So, as is often the case, seems like there's rough consensus except on what to name the darned thing.

TomAugspurger commented 4 years ago

cc @simonjayhawkins if you see any issues with how this ArrayLike will interact with pandas' typing.

person142 commented 4 years ago

I don't want us to lose momentum here, so: I find @eric-wieser's comment

Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible

to match with my own experience as well-typing in Python is fairly verbose, so short forms like

import typing as t

or

from typing import ...

seem to be the norm. For that reason I propose that we move ahead with putting things in numpy.typing. For now it will be in the stubs only, and when we merge the stubs into NumPy itself we can make it available at runtime.

Are people ok with that, or shall be continue to discuss?

shoyer commented 4 years ago

+1 for numpy.typing, but please send this to the email list, which is the place of record for design decisions

On Sat, May 9, 2020 at 10:17 AM Josh Wilson notifications@github.com wrote:

I don't want us to lose momentum here, so: I find @eric-wieser https://github.com/eric-wieser's comment

Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible

to match with my own experience as well-typing in Python is fairly verbose, so short forms like

import typing as t

or

from typing import ...

seem to be the norm. For that reason I propose that we move ahead with putting things in numpy.typing. For now it will be in the stubs only, and when we merge the stubs into NumPy itself we can make it available at runtime.

Are people ok with that, or shall be continue to discuss?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/numpy/numpy-stubs/pull/66#issuecomment-626208122, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVU6ZQ67KMCLMA4UE6LRQWF27ANCNFSM4MM7A2OQ .

rgommers commented 4 years ago

Sounds fine to me, thanks for keeping this moving.

person142 commented 4 years ago

Ok, mailing list has been notified, PR has been rebased, review comments have been addressed (I think), the types have been moved into numpy.typing, and the tests have been updated accordingly.

person142 commented 4 years ago

Any objections to moving forward? Since this conflicts with just about every other PR it would be nice to get it in to avoid more rebasing.

BvB93 commented 4 years ago

No complaints here from my side; feel free to continue.

shoyer commented 4 years ago

Look good to me!

person142 commented 4 years ago

In it goes then. Thanks for reviewing everyone!