Closed gandhis1 closed 2 years ago
Is a generator really acceptable? I would think that pandas might want to require len
on both the inner and outer containers.
The documentation says it supports an Iterable
, and isn't a Generator
an Iterable
? And besides, they don't currently call len
(the above code example works).
You could try adding Iterable
to ListLikeU
. Only concern is that there might be Iterable
objects that would not be acceptable to pandas, and that could be really hard to test.
In other words, let's suppose there is a class that is Iterable
, and a user writes code to pass an instance of that class to the DataFrame
constructor. It might be the case that pandas
would fail using that class, but it would still pass the type checker. That means the type Iterable
is too wide for the constructor.
I'm not sure that will happen here, but it is something to be cognizant of, as our testing methodology doesn't pick up cases where the types are too wide.
Roughly speaking, a generator is used when something iterates and the end condition isn't known. An iterable is when you know how many times the loop will run.
A Generator is an Iterator and also an Iterable: https://github.com/python/typeshed/blob/66751e2ebfed2540715426b3d6b2ffb8c8e16b57/stdlib/typing.pyi#L356
Pandas allows almost all iterables for pd.DataFrame(data)
but has some prominent exclusions: str, bytes, and sets (excluded in is_list_like).
I think it would definitly be safe to allow Generator for pd.DataFrame
as it doesn't include str/bytes/sets.
A two-dimensional array (list-of-list) is an acceptable value to pass to the
data
argument of thepd.DataFrame
initializer. This works fine, however when the outer dimension is aGenerator
and not aList
, this does not work. It seemsListLikeU
should really be anyIterable
(noting that this then would overlap somewhat with theIterable[Tuple]
annotation).Example:
Note that actual types here are irrelevant, which is why I manually annotated as
Any
. The error remains even when you remove this annotation.Output: