Closed person142 closed 4 years ago
As something of a data point, MyPy passes on SciPy when checked against this branch. Now the types in SciPy are still very rough, so take that with a grain of salt, but perhaps it means something.
What about other array-like objects? Are things with both __array__
and __len__
passing because of Sequence
, or is that not tested? Or is that what the Protocol discussion was about?
@rgommers @BvB93 thanks for the feedback-I added a _SupportsArray
protocol in https://github.com/numpy/numpy-stubs/pull/66/commits/4f654e538115c8c741dcd96b0b64d5ab8d8ef80e; it is quite a nice unification.
It seems that the principle we're roughly going for here is "don't allow stuff that will produce object arrays", though we still leave an escape hatch a la
@rgommers @BvB93 thanks for the feedback-I added a
_SupportsArray
protocol in 4f654e5; it is quite a nice unification.
Btw, doesn't __array__()
also have the dtype
argument?
Btw, doesn't array() also have the dtype argument?
Hm, it appears to be a little wonky:
>>> np.float64(1).__array__()
array(1.)
>>> np.float64(1).__array__(np.complex128)
array(1.+0.j)
>>> np.float64(1).__array__(dtype=np.complex128)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __array__() takes no keyword arguments
>>> class A:
... def __array__(self, dtype):
... return np.array([1, 2, 3], dtype=dtype)
...
>>> np.array(A())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __array__() missing 1 required positional argument: 'dtype'
>>> np.array(A(), dtype=np.float64)
array([1., 2., 3.])
>>> class B:
... def __array__(self, dtype=None):
... return np.array([1, 2, 3], dtype=dtype)
...
>>> np.array(B())
array([1, 2, 3])
>>> np.array(B(), dtype=np.float64)
array([1., 2., 3.])
>>> class C:
... def __array__(self):
... return np.array([1, 2, 3])
...
>>> np.array(C())
array([1, 2, 3])
>>> np.array(C(), dtype=np.complex128)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __array__() takes 1 positional argument but 2 were given
That's unfortunate... it looks like the __array__
methods on scalars are inherently different from the user ones since they force positional arguments, whereas only positional arguments in the user ones don't work.
Ok, I made the protocol
class _SupportsArray(Protocol):
def __array__(
self, dtype: Optional[_DTypeLike] = ...
) -> Union[Sequence, ndarray]: ...
This unfortunately means that ndarray
and generic
no longer satisfy it! To work around that I added them back to the big union. This is kind of gross, but I think it's the best we can do given the above examples. WDYT?
This unfortunately means that ndarray and generic no longer satisfy it! To work around that I added them back to the big union. This is kind of gross, but I think it's the best we can do given the above examples. WDYT?
I'd suggest adding another overload to the _SupportsArray
protocol instead, such that it works with in situations where dtype
is positional-only or keyword-or-positional:
# first overload: dtype is optional and positional-only
# Second overload: dtype is optional and can be a positional or keyword argument
class _SupportsArray(Protocol):
@overload
def __array__(self, __dtype: _DtypeLike = ...) -> ndarray: ...
@overload
def __array__(self, dtype: _DtypeLike = ...) -> ndarray: ...
class A():
def __array__(self, dtype: _DtypeLike = None) -> ndarray:
return np.array([1, 2, 3], dtype=dtype)
class B():
def __array__(self, __dtype: _DtypeLike = None) -> ndarray:
return np.array([1, 2, 3], dtype=__dtype)
a = A()
a.__array__()
a.__array__(float)
a.__array__(dtype=float)
b = B()
b.__array__()
b.__array__(float)
b.__array__(dtype=float) # E: Unexpected keyword argument
b.__array__(__dtype=float) # E: Unexpected keyword argument
By the way, why is the returned type annotated as Union[Sequence, ndarray]
in your example? Doesn't it always just return ndarray
?
I'd suggest adding another overload to the _SupportsArray protocol instead, such that it works with in situations where dtype is positional-only or keyword-or-positional
The signatures are incompatible, so that’s going to violate Liskov I think. (But I’ll give it a try.)
By the way, why is the returned type annotated as Union[Sequence, ndarray] in your example?
I was going by the docs here:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html
which say ndarray or nested sequence.
The signatures are incompatible, so that’s going to violate Liskov I think. (But I’ll give it a try.)
It's not quite clear to me where this incompatibility is located.
__array__()
(ref) takes self
and (optionally) dtype
as argument. The newly overloaded _SupportsArray
can now deal with cases where dtype
is either keyword-only or positional-or-keyword.
I was going by the docs here:
I believe what they're talking about in the docs is an object which either:
a. Has an __array__()
method that returns an ndarray
.
b. Is a (nested) sequence.
So Union[_SupportsArray, Sequence[Any]]
in other words, rather than __array__()
itself (potentially) returning a Sequence.
It's not quite clear to me where this incompatible is located.
I believe what they're talking about in the docs is an object which either:
Yup, right on both counts! Updated; hopefully we're good to go now.
Note that I had to make _DtypeLike
public too so that someone could use it in the annotations for their __array__
method, but that's ok because it was desired anyway (https://github.com/numpy/numpy-stubs/issues/13).
This looks really nice!
My own serious concern here is about adding public type aliases. These do seem quite useful, but what would this imply if/when we move type annotations into NumPy properly? If np.ArrayLike
needs to be valid, does that imply that if we move annotations into NumPy we need to expose ArrayLike
in the public API of NumPy?
If np.ArrayLike needs to be valid, does that imply that if we move annotations into NumPy we need to expose ArrayLike in the public API of NumPy?
Thankfully it doesn't-this is the same sort of fudging that happens with things like Queue[T]
in the stdlib (where the real class isn't actually generic). But it does mean that people using annotations need to take one of these options:
import numpy as np
x: "np.ArrayLike" # Use strings
from __future__ import annotations
import numpy as np
x: np.ArrayLike # Now it's treated as a string anyway
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from numpy import ArrayLike
else:
ArrayLike = None # Or whatever
x: ArrayLike
Although maybe I am interpreting your question in the wrong way. I think a backdrop to my answer above is an assumption that even when we move the types into NumPy, they will remain stubs instead of being inlined into the code.
I suspect that this is the right course of action because we've seen that the types require a fair bit of "fudging"; i.e. we aren't trying to represent the full NumPy API but instead some typeable subset of it. I think that if the types are inlined then we lose the ability to do that fudging as well. (And lose the ability to do a gazillion overloads; doubt that is going to fly in the NumPy codebase proper.)
I'd contrast this to e.g. SciPy where we are inlining the types, mostly because there isn't much odd there (except for needing a bunch of stubs for extension modules).
FWIW, I've heard that at the language summit they decided __future__.annotations
will become default in 3.10 I believe.
I appreciate that we could need np.ArrayLike in strings or type annotations, but I suspect that it could still lead to some user confusion to not define them at runtime, too, e.g., to cover use cases like type aliases. The user experience is independent of whether we choose to use stubs or annotations inside NumPy — though I expect we’ll probably end up with some of both.
NumPy-stubs is certainly still experimental, so you definitely have my blessing to go ahead for now. But I do think it could be worth sounding out the broader NumPy community on the appetite for adding a selective handful of type protocols into NumPy proper. We might get some useful feedback. For example, should the protocol be called ArrayLike
or array_like
as currently appears in many NumPy docstrings?
If it helps making ArrayLike
et al. public, I don't see a big issue with that; would probably prefer to write it as
from numpy.types import ArrayLike
x: ArrayLike
to keep it out of the main namespace. In my (limited) experience, it's helpful for things to exist at runtime.
from numpy.types import ArrayLike
Keeping it out of the main namespace seems good, though maybe it should be numpy.typing
instead of numpy.types
to match the stdlib module?
Re
But I do think it could be worth sounding out the broader NumPy community on the appetite for adding a selective handful of type protocols into NumPy proper.
I'll send out something to the mailing list.
from numpy.types import ArrayLike
Keeping it out of the main namespace seems good, though maybe it should be
numpy.typing
instead ofnumpy.types
to match the stdlib module?
No, I avoided that name on purpose, naming something the same as a stdlib module is usually a bad idea (e.g. scipy.io
was a pretty bad choice, that's why it's usually import as spio
).
What about something along the lines of numpy.annotations
?
Keeping it out of the main namespace seems good, though maybe it should be
numpy.typing
instead ofnumpy.types
to match the stdlib module?No, I avoided that name on purpose, naming something the same as a stdlib module is usually a bad idea
As a counterargument, numpy.distutils
clashes with distutils
, but we used it anyway, presumably because the similarity was worth emphasizing.
Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible, as:
from typing import Tuple
from numpy.typing import ArrayLike
def get_arr() -> Tuple[ArrayLike, int]: ...
The similarity aids reading here, and the clash is irrelevant.
Alternatively, some users might want the full names anyway. The clash is again irrelevant:
def get_arr() -> typing.Tuple[np.typing.ArrayLike, int]: ...
Finally, if the user cares enough to import just the submodule, they probably want to do something similar with typing
anyway (note: I've not actually seen anyone do this).
import typing as t
import numpy.typing as npt
def get_arr() -> t.Tuple[npt.ArrayLike, int]: ...
Ok, discussion on the mailing list:
http://numpy-discussion.10968.n7.nabble.com/Feelings-about-type-aliases-in-NumPy-td48059.html
seems to be dying down. Takeaways so far:
So, as is often the case, seems like there's rough consensus except on what to name the darned thing.
cc @simonjayhawkins if you see any issues with how this ArrayLike will interact with pandas' typing.
I don't want us to lose momentum here, so: I find @eric-wieser's comment
Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible
to match with my own experience as well-typing in Python is fairly verbose, so short forms like
import typing as t
or
from typing import ...
seem to be the norm. For that reason I propose that we move ahead with putting things in numpy.typing
. For now it will be in the stubs only, and when we merge the stubs into NumPy itself we can make it available at runtime.
Are people ok with that, or shall be continue to discuss?
+1 for numpy.typing, but please send this to the email list, which is the place of record for design decisions
On Sat, May 9, 2020 at 10:17 AM Josh Wilson notifications@github.com wrote:
I don't want us to lose momentum here, so: I find @eric-wieser https://github.com/eric-wieser's comment
Almost all the usage of type annotations I've seen in the wild has erred on the side of keeping the annotations as short as possible
to match with my own experience as well-typing in Python is fairly verbose, so short forms like
import typing as t
or
from typing import ...
seem to be the norm. For that reason I propose that we move ahead with putting things in numpy.typing. For now it will be in the stubs only, and when we merge the stubs into NumPy itself we can make it available at runtime.
Are people ok with that, or shall be continue to discuss?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/numpy/numpy-stubs/pull/66#issuecomment-626208122, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJJFVU6ZQ67KMCLMA4UE6LRQWF27ANCNFSM4MM7A2OQ .
Sounds fine to me, thanks for keeping this moving.
Ok, mailing list has been notified, PR has been rebased, review comments have been addressed (I think), the types have been moved into numpy.typing
, and the tests have been updated accordingly.
Any objections to moving forward? Since this conflicts with just about every other PR it would be nice to get it in to avoid more rebasing.
No complaints here from my side; feel free to continue.
Look good to me!
In it goes then. Thanks for reviewing everyone!
Closes https://github.com/numpy/numpy-stubs/issues/37.
Add tests to check various examples.