Closed matangover closed 2 years ago
I suggest adding to _typeshed
an alias AnyIterable: TypeAlias = Union[Iterable | SupportsGetItem]
Or, well, since SupportsGetItem has two type parameters that won't exactly work, but something along those lines.
This affects every single place where Iterable
is accepted as an argument, except in the unlikely case that the code directly calls .__iter__
.
We could add an alias like you suggest and use it everywhere, but that doesn't seem like a great solution. The alias wouldn't get used in code outside typeshed, it would be harder for users to pick up this special case, and type checker error messages may get worse.
The alias wouldn't get used in code outside typeshed, it would be harder for users to pick up this special case
I don't think it's common to call iter
or PySequence_Fast
on function arguments in "user" code... At least I never do that in my code. Do you suggest to add a new type to typing_extensions
or typing
?
type checker error messages may get worse
True. But in a minor way. It would probably only change from showing "Iterable" to "SequenceOrIterable". Which is in fact more correct and less misleading.
In the end, the job of typeshed is to correctly annotate the stdlib
I don't think it's common to call
iter
orPySequence_Fast
on function arguments in "user" code... At least I never do that in my code. Do you suggest to add a new type totyping_extensions
ortyping
?
Anything useful you do with an iterable will ultimately call one of those functions (or some other part of the Python implementation that supports the __getitem__
-based protocol).
Right. So maybe indeed the right thing is to add it to typing
or collections.abc
. Maybe we could fix typeshed's annotations first, and later if a new type is introduced to Python it can be adopted.
I don't think typeshed needs changes here, the problem is that Iterable
has special additional behaviour not representable in the stub. It'd need to be handled specially in the type checker.
I disagree, the type annotations are wrong. They specify the argument has to have __iter__
but it's not true. This is really easy to fix in typeshed.
special additional behaviour not representable in the stub
Wait what behavior do you mean? Apart from having __getitem__
?
The behavior is representable in a stub. Iterable would be defined something like this:
class _IterIterable(Protocol[_T_co]):
def __iter__(self) -> Iterator[_T_co]: ...
class _GetItemIterable(Protocol[_T_co]):
def __getitem__(self, __idx: int) -> _T_co: ...
Iterable: TypeAlias = _IterIterable[_T_co] | _GetItemIterable[_T_co]
But I still don't like this change much. Suppose we add a typing_extensions.RealIterable
defined like the above. And then:
Iterable
ABC so people can use it as a base class.Iterable
to RealIterable
, creating tons of work throughout the ecosystem.And all that for an edge case so rare that it's only been reported a handful of times in mypy's years of history.
Even that's not exactly correct, since it doesn't do the __iter__ = None
behaviour, and since unions aren't ordered this is ambiguous with conflicting getitem/iter definitions - mappings for instance. It's just another case where magic methods have unusual behaviour.
And all that for an edge case so rare that it's only been reported a handful of times in mypy's years of history.
I ran into it with torch.utils.data.Dataset
, which implements __getitem__
but not __iter__
. It works great with stdlib functions at runtime but generates lots of false type errors. A lot of people use this class.
I really wouldn't mind if the fix is in type checkers instead of in typeshed. But I do think there should be a fix :) Personally I think that having a special case for Iterable in all type checkers is an "even worse" solution.
@erictraut maybe you could have an opinion or suggestion here
The fix here cannot (or rather should not) be provided by type checkers. If this problem needs to be fixed, it should be done so in the stubs.
I also question whether this is a problem that needs fixing. As Jelle indicated, this seems like an extreme edge case that rarely comes up. Perhaps we'd be better off documenting a workaround?
In addition to the downsides that Jelle mentioned, I'll add one more: replacing Iterable
with a union of two generic protocol types will likely have a measurable impact on type checking performance for many code bases. This is such a core type that is used in so many common stdlib methods, and forcing type checkers to evaluate two protocols instead of one will likely add to analysis times.
Thanks for weighing in, good points
Perhaps we'd be better off documenting a workaround?
Do you mean implementing __iter__
? (i.e. in this case, the developers of PyTorch adding Dataset.__iter__
)
Do you mean implementing
__iter__
?
Or wrapping it in a call to the list
constructor. I realize this has a bit of runtime overhead, but in most cases where the number of elements is small, that's a reasonable workaround.
Or wrapping it in a call to the list constructor
Definitely not an option for Dataset
I opened a PR against torch that might help :-) https://github.com/pytorch/pytorch/pull/77116
Here's another possible workaround, assuming I can't modify original library's code (as @hauntsaninja did), and I want to have a function accepting any iterable (e.g. pytorch Dataset and other "normal" iterables), with only a thin wrapper without coercing to list
:
from typing import Generic, Iterable, Iterator, Protocol, TypeVar, Union
from typing_extensions import TypeAlias
T_co = TypeVar("T_co", covariant=True)
class GetItemIterable(Protocol[T_co]):
def __getitem__(self, __idx: int) -> T_co: ...
AnyIterable: TypeAlias = Union[Iterable[T_co], GetItemIterable[T_co]]
class GetItemIterableWrapper(Generic[T_co]):
def __init__(self, iterable: GetItemIterable[T_co]):
self.iterable = iterable
def __iter__(self) -> Iterator[T_co]:
return iter(self.iterable) # type: ignore
def to_iterable(x: AnyIterable[T_co]) -> Iterable[T_co]:
return x if isinstance(x, Iterable) else GetItemIterableWrapper(x)
# Example use:
class MyIterable:
def __getitem__(self, idx: int) -> str:
if idx >= 3:
raise IndexError()
return "blah"
def strjoin1(sep: str, iterable: AnyIterable[str]):
# The following line gives type error (but works at runtime)
return sep.join(iterable)
def strjoin2(sep: str, iterable: AnyIterable[str]):
# The following line gives no error
return sep.join(to_iterable(iterable))
print("Testing __iter__ iterable:")
print(strjoin1(" ", ["1", "2", "3"]))
print(strjoin2(" ", ["1", "2", "3"]))
print("\nTesting __getitem__ iterable:")
print(strjoin1(" ", MyIterable()))
print(strjoin2(" ", MyIterable()))
Hmm, I guess I would be in favour of a single typeshed change: allow iter
to accept a GetItemIterable
. Then you could use iter(MyIterable())
in places that expect an Iterator
. Hopefully in most situations this is most of what you need and you can get away without the friction of defining your own protocols and helpers as above.
Yes, that would remove the need for type: ignore
in my example. The rest would still be needed.
Oh, you are totally right, it isn't! Because an iterator is also iterable...
Hopefully https://github.com/python/typeshed/pull/7817 will go some way towards making things more ergonomic :-)
Thanks for the report, and I agree that this is one of 9857 things that are slightly annoying about Python's typing system. I think I speak for my fellow maintainers, though, when I say that we're not going to make the change you're asking for here. While it's true that objects with __getitem__
but not __iter__
do support iteration, I think that should properly be viewed as an accident of history rather than something we should bend over backwards to account for in typeshed. If a class is meant to be iterable, it should generally define __iter__
.
I'm glad @hauntsaninja's ingenious #7817 was merged, and I hope his pytorch PR is also merged :)
Thanks all for your insightful comments, I totally agree with what you said @AlexWaygood. I think the change applied by @hauntsaninja is an excellent workaround. One thing that could be done is to document this somewhere, but seems like there is no real "Python typing docs" place, only the type checkers document this individually
We could put up an essay on typing.readthedocs.io, similar to https://typing.readthedocs.io/en/latest/source/unreachable.html about exhaustiveness checking.
Many functions in stdlib that are documented to accept an "iterable" are annotated as
Iterable[T]
. This means the arguments must have__iter__
. However, the actual functions don't require the arguments to have__iter__
, they also work with arguments that have only__getitem__
. That's because internally they useiter
(in Python) or PySequence_Fast (in C).As the docs for both
iter
andPySequence_Fast
functions show, these functions support two protocols for the argument: the "iterable" protocol (__iter__
) and the "sequence" protocol (__getitem__
).Example functions that have this bug right now:
For enumerate, a bug was actually filed on the mypy repo but in fact I believe this should be corrected in typeshed.
Aside:
for
loops also similarly accept "iterables" that have only__getitem__
, but that is implemented inside type checkers themselves. For now mypy still doesn't support it properly, but Pyright does.