konradhalas / dacite

Simple creation of data classes from dictionaries.
MIT License
1.76k stars 106 forks source link

TypeError for generic numpy.ndarray fields #155

Open tgpfeiffer opened 3 years ago

tgpfeiffer commented 3 years ago

Since numpy 1.21 the numpy.ndarray class is generic for mypy and needs to be annotated with type parameters, and unfortunately this means that I cannot use dacite.from_dict any more when one member of the dictionary is such an ndarray.

Minimal example:

from dataclasses import dataclass
from typing import List
import dacite
import numpy as np
import numpy.typing as npt

@dataclass
class Foo:
    x: List[float]

@dataclass
class Bar:
    x: npt.NDArray[np.float64]

print(dacite.from_dict(data_class=Foo, data={"x": [1.0, 2.0]}))
print(dacite.from_dict(data_class=Bar, data={"x": np.array([1.0, 2.0])}))

In the code above, Foo works fine, but creating Bar fails with

Traceback (most recent call last):
  File "nptest.py", line 20, in <module>
    print(dacite.from_dict(data_class=Bar, data={"x": np.array([1.0, 2.0])}))
  File ".../python3.8/site-packages/dacite/core.py", line 60, in from_dict
    transformed_value = transform_value(
  File ".../lib/python3.8/site-packages/dacite/types.py", line 36, in transform_value
    return collection_cls(transform_value(type_hooks, cast, item_cls, item) for item in value)
TypeError: expected sequence object with len >= 0 or a single integer 

If I add config=dacite.Config(check_types=False) to the from_dict() call it does not work, either, and fails with the same error.

If I change the type of Bar.x from npt.NDArray[np.float64] to a plain np.ndarray then it works again, but mypy complains

error: Missing type parameters for generic type "ndarray"  [type-arg]

I am not exactly sure what that error message means. Is there anything I can do to work around this?

tgpfeiffer commented 3 years ago

@konradhalas Please let me ask for your opinion on how I should address this issue.

As I wrote in the issue above, generic type annotations for numpy.ndarray are now possible in numpy >= 1.20, and they are required unless ignore: type[type-arg] is used. numpy.typing.NDArray[T] becomes an alias for numpy.ndarray[Any, T] and then we run into a number of issues with the conversion in https://github.com/konradhalas/dacite/blob/d2206b2e4711859da0ea5862c395940f33693e80/dacite/types.py#L25-L36 as the "new" ndarray is now considered a generic collection as per the code in is_generic_collection(), where the "old" ndarray is not:

  1. The code gets the type of the items of the generic collection using item_cls = extract_generic(target_type, defaults=(Any,))[0], i.e., the first generic argument is always used. However, for numpy the correct one would be the second.
  2. The code builds the instance using collection_cls(transform_value(type_hooks, cast, item_cls, item) for item in value) but numpy.ndarray cannot work with a generator as a parameter. That is actually the reason for the error message I shared in the issue description.
  3. The code assumes that passing the items of the converted collection into the constructor reproduces the collection, but that is not the case for numpy.ndarray. If the items in the array are [1, 2, 3] and we pass this list into the numpy.ndarray constructor then it will build an array with shape (1, 2, 3), not with these values.

These are a number of assumptions that would all have to be addressed in order for numpy.ndarray to work like a normal generic collection, and I don't feel it's reasonable to do so.

Therefore I would like to suggest adding an exception specifically for numpy.ndarray that prevents this class to be treated as a generic collection, something along the lines of

--- a/dacite/types.py
+++ b/dacite/types.py
@@ -142,11 +142,15 @@ def is_generic_collection(type_: Type) -> bool:
         return False
     origin = extract_origin_collection(type_)
     try:
-        return bool(origin and issubclass(origin, Collection))
+        return bool(origin and issubclass(origin, Collection) and not skip_generic_conversion(origin))
     except (TypeError, AttributeError):
         return False

+def skip_generic_conversion(origin: Type) -> bool:
+    return origin.__module__ == "numpy" and origin.__qualname__ == "ndarray"
+
+
 def extract_generic(type_: Type, defaults: Tuple = ()) -> tuple:
     try:
         if hasattr(type_, "_special") and type_._special:

which makes the conversion work for both numpy.ndarray as well as numpy.typing.NDArray[numpy.float64] members. This is a bit hacky but in particular it means we don't have to introduce a dependency on numpy. Would that be an acceptable method for you?

One issue that still happens with this is that the converted value fails the type check at the end:

wrong value type for field "a" - should be "ndarray" instead of value "[1 2 3]" of type "ndarray"

which happens because in https://github.com/konradhalas/dacite/blob/d2206b2e4711859da0ea5862c395940f33693e80/dacite/types.py#L135 we have something like

>>> isinstance(numpy.array([1,2,3]), numpy.typing.NDArray[numpy.float64])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tgp/.envs/dacite/lib64/python3.8/site-packages/numpy/typing/_generic_alias.py", line 137, in __instancecheck__
    raise TypeError("isinstance() argument 2 cannot be a "
TypeError: isinstance() argument 2 cannot be a parameterized generic

I am not entirely sure how to address this best. I have tried to add a check like

--- a/dacite/types.py
+++ b/dacite/types.py
@@ -127,6 +127,9 @@ def is_instance(value: Any, type_: Type) -> bool:
         return is_instance(value, extract_init_var(type_))
     elif is_type_generic(type_):
         return is_subclass(value, extract_generic(type_)[0])
+    elif is_generic(type_):
+        origin = extract_origin_collection(type_)
+        return isinstance(value, origin)
     else:
         try:
             # As described in PEP 484 - section: "The numeric tower"

which does work, but it makes the test_is_instance_with_not_supported_generic_types test fail. I do not have the insight into whether this test was added more for documentary purposes and it's ok to remove it, or whether you actually want to keep this behavior.

I have a PR more or less ready, but I would like to check whether you have any opinion on the proposed changes. Please let me know any thoughts you have.

Mjboothaus commented 2 years ago

Yes - this is a problem for me too :(

     67     if config.check_types and not is_instance(value, field.type):
---> 68         raise WrongTypeError(field_path=field.name, field_type=field.type, value=value)
     69 except KeyError:
     70     try:

WrongTypeError: wrong value type for field "z" - should be "array" instead of value "[0. 0. 0. ... 0. 0. 0.]" of type "ndarray"