machinalis / mypy-data

mypy typesheds for the Python data stack
BSD 3-Clause "New" or "Revised" License
86 stars 23 forks source link

_ArrayLike[Any] vs. ndarray[Any] #14

Open kjyv opened 7 years ago

kjyv commented 7 years ago

It seems a bit unclear to me how to properly annotate functions that expect an ndarray. Many ndarray functions (e.g. flatten) return _ArrayLike which is then not recognized to be an ndarray. I guess I can't and shouldn't use _ArrayLike in my own annotations. Is this currently not supported properly by mypy or do I have to do this differently? Btw., the numpy docs actually specify the return type of e.g. flatten to be ndarray, not "array_like".

kjyv commented 7 years ago

It seems to me that many methods are defined in the _ArrayLike class and return _ArrayLike but the actual numpy methods are only defined in numpy.ndarray and always return an ndarray. Maybe they should move to class ndarray to remove any type confusion. E.g. slicing an ndarray should also create an ndarray and not a variable of type _ArrayLike.

shoyer commented 7 years ago

Agreed. _ArrayLike is a useful class for annotation function signatures, but I don't think it makes sense for methods.

dmoisset commented 7 years ago

@kjyv can you show a small sample snippet that shows this? I usually try to use abstractions in annotations instead of concrete types (it's what you usually mean in python, given duck typing), although they were some tricky things around numpy semantic that can make this general advice wrong.

shoyer commented 7 years ago

We're often a sometimes a little sloppy on terminology, but it can be useful to distinguish between "array likes" and "duck arrays" (I think that's the source of our confusion here).

To quote @njsmith:

NB: we should probably be careful to distinguish between "array-likes" (which is a term that's already well established to mean "anything that can be passed to np.asarray", and includes scalars, lists, memoryviews, among others), versus what we've been calling "duck arrays", i.e. objects that act like ndarray while not actually being ndarrays or even necessarily convertible to ndarrays.

We definitely want at least support for NumPy ArrayLike types (which, given the way NumPy currently works, means basically any arbitrary object), but DuckArray should be a stronger constraint of some form. The challenge is that DuckArray is not entirely well defined, because various NumPy functions that handle duck arrays only look for the particular properties they need. There's no single notation of a duck array, so it should really should be considered a collection of protocols.

Supposing that @overload is handled in order of definition (https://github.com/python/typing/issues/253), a "proper" type definition for transpose might look something like:

@overload
def transpose(array: SupportsTranspose, indices: Tuple[int] = None) -> SupportsTranspose

def transpose(array: ArrayLike, indices: Tuple[int] = None) -> ndarray

where SupportsTranspose indicates a object with a .transpose method.

In practice, this gets pretty complex and I'm not sure it's worth the trouble. If someone is using type checking, they probably would be happy with slightly stricter functions, even if they aren't defined on everything. So the later signature might be enough for now.

kjyv commented 7 years ago

@dmoisset As an example, see this. arr passes fine (creation routines return ndarray) while arr2 gives a type error.

import numpy as np

def test(data):
    #type: (np.ndarray) -> None
    print(data)

arr = np.ones(5)
test(arr)  # fine

arr2 = arr[0:3]
test(arr2)  # expected np._ArrayLike

I wonder now why I didn't try to set np._ArrayLike as expected input type since that covers both cases. The Readme does not give that hint, but maybe the correct answer is to simply use that type. However, even the methods defined within _ArrayLike should return ndarray, as there are no corresponding methods in numpy that return arraylike.