lidatong / dataclasses-json

Easily serialize Data Classes to and from JSON
MIT License
1.34k stars 150 forks source link

Warn about buggy type resolution #529

Closed DavidCain closed 3 weeks ago

DavidCain commented 1 month ago

This type annotation resolution is a flawed implementation -- ideally a proper fix can come later, but for now it's worth warning about the present behavior.

dataclasses-json does not behave correctly when using string-style annotations (a frequent tactic for annotating types that have not yet been defined at that point in the file).

The core problem comes from the assumption that type annotations in a module will match sys.modules() -- maybe_resolved will frequently resolve to a different value than a typechecker would!

A short example of the error

example_types.py:

from dataclasses import dataclass

from dataclasses_json import dataclass_json

@dataclass_json
@dataclass
class Config:
    options: list["Option"]

@dataclass_json
@dataclass
class Option:
    label: str

Expected behavior

_decode_items correctly identifies Option as coming from example_types:

>>> from example_types import Config
>>> print(Config.from_dict({"options": [{"label": "repro"}]}).options)
[Option(label='repro')]

Unexpected behavior

_decode_items() incorrectly identifies "Option" as from a third party module that is not in scope at time of annotation:

repro.py:

import click.parser  # Has `Option`
from example_types import Config

print(Config.from_dict({"options": [{"label": "repro"}]}).options)
$ python3.10 repro.py
[{'label': 'repro'}]

Related fix -- truthiness

The conditional if maybe_resolved is meant to handle the case of an attribute not being found in the module (it could basically be written as if maybe_resolved is not None). However, that doesn't cover the case of types that are falsy at runtime!

CustomNoneType = None

values: list['CustomNoneType'] = [None, None]

We can just use hasattr() instead.

DavidCain commented 1 month ago

I expect that https://github.com/lidatong/dataclasses-json/issues/442 may well address this issue with a refactor (which is why I didn't make any effort to change behavior here). However, I at least wanted to document this issue & hopefully help others who stumble on it!

DavidCain commented 1 month ago

For anybody else reading this (and using this package on Python 3.7 through 3.10) a good workaround is to use from __future__ import annotations -- that way you can use postponed evaluations:

+from __future__ import annotations

 @dataclass_json
 @dataclass
 class Config:
-    options: list["Option"]
+    options: list[Option]
DavidCain commented 1 month ago

@lidatong - do you have any thoughts about this? I'm hoping a warning can save some others a lot of time & confusion -- took me a little while to realize that this package resolves names to the first module that happens to have a variable of that name.

george-zubrienko commented 4 weeks ago

@DavidCain is it possible for you to add some units proving the fix? Also note that after 3.11 this is not super relevant since 'self' types are resolved there w/o string hacks

george-zubrienko commented 4 weeks ago

For anybody else reading this (and using this package on Python 3.7 through 3.10) a good workaround is to use from __future__ import annotations -- that way you can use postponed evaluations:


+from __future__ import annotations

 @dataclass_json

 @dataclass

 class Config:

-    options: list["Option"]

+    options: list[Option]

No you should not use that import with DCJ as it breaks the type handling in a lot of cases. Please review documentation :)

DavidCain commented 4 weeks ago

@DavidCain is it possible for you to add some units proving the fix?

@george-zubrienko - I can certainly add unit tests. Just for clarity though, are you suggesting that I:

Also note that after 3.11 this is not super relevant since 'self' types are resolved there w/o string hacks

Yup, agreed that Python 3.11 helps! That's why I recommended from __future__ import annotations -- it's basically opting into Python 3.11 postponed evaluations.

Though note that the issue demonstrated here is not annotating the type of the class itself, but instead annotating a class that comes later in the file and hasn't been interpreted yet at time of declaration.

DavidCain commented 4 weeks ago

No you should not use that import with DCJ as it breaks the type handling in a lot of cases. Please review documentation :)

@george-zubrienko , I assume you're referring to the line here -- https://github.com/lidatong/dataclasses-json?tab=readme-ov-file#handle-recursive-dataclasses

I understand that the __future__ import may not be "sanctioned" but given the choice between whatever bugs may arise from Python 3.11-style typing or "pick the first module which happens to have a variable of that name" that's an easy choice.

In other words, using 'Option' creates a bug 100% of the time, but I haven't seen bugs from using annotations). Finally, it should be safe to use Python's __future__ imports! To not support future imports is to be unable to support newer Python versions.

george-zubrienko commented 4 weeks ago

Finally, it should be safe to use Python's future imports! To not support future imports is to be unable to support newer Python versions.

You misunderstand the concept of future imports. Those are things that might not even make it into next release. Using those is always at your own risk and no, no stable and reliable code should rely or support future imports. This is especially important when you are working with type handling which in Python is screwed to the rest of the language with rusty nails

george-zubrienko commented 4 weeks ago

I understand that the future import may not be "sanctioned" but given the choice between whatever bugs may arise from Python 3.11-style typing or "pick the first module which happens to have a variable of that name" that's an easy choice.

Easy choice that works for easy cases. Metaclasses, apps that rely on pickling such as pyspark will have issues with that specific import, not to mention that by adding it you implicitly add a necessity to verify all your other imports behaviours as they might be affected as well.

george-zubrienko commented 4 weeks ago

test this current approach of maintaining the buggy behavior (but emitting warnings?) or change the way string annotations are resolved, fixing the bug

It is a tough one to fix pre-3.11. You'll have play the string matching game which is tough, I fixed that for one of the cases some months ago, but I feel like its just chasing the mice without addressing the root issue. Unfortunately we have to wait for 3.10 EOL, so a warning should be fine, I'd just like to see smth that covers the code you are adding If you have a good idea for a proper fix, I would be glad to review and get it in if implemented

DavidCain commented 4 weeks ago

You misunderstand the concept of future imports. Those are things that might not even make it into next release.

Fair enough! I know that any future import may ultimately never be added to the language ("MandatoryRelease may also be None, meaning that a planned feature got dropped or that it is not yet decided.")

https://docs.python.org/3/library/__future__.html

What I meant to say is that all current possible imports in __future__ already have been released (or will be released). They should all be safe to use.

>>> from __future__ import annotations
>>> annotations.getMandatoryRelease()
(3, 11, 0, 'alpha', 0)

Using those is always at your own risk and no, no stable and reliable code should rely or support future imports.

Disagree -- __future__ is a canonical way to support multiple Python versions at once.

This is especially important when you are working with type handling which in Python is screwed to the rest of the language with rusty nails

Hah no disagreement there, but again really can't stress enough how much "pick the first module that happens to have a variable of this name" is also screwed.

DavidCain commented 4 weeks ago

It is a tough one to fix pre-3.11. You'll have play the string matching game which is tough, I fixed that for one of the cases some months ago, but I feel like its just chasing the mice without addressing the root issue. Unfortunately we have to wait for 3.10 EOL, so a warning should be fine, I'd just like to see smth that covers the code you are adding If you have a good idea for a proper fix, I would be glad to review and get it in if implemented

Great, I'd be happy to add some test coverage for what I've done here! If I have the time for a proper fix, I can follow up with a proposal.

george-zubrienko commented 4 weeks ago

pick the first module that happens to have a variable of this name" is also screwed.

We have a running joke in the office that all Python development is hope-driven ;) If you manage to get a quick test for the warning in, I'll release this one and one more for abstract classes on the weekend. Also small disclaimer, I plan to remove tests for 3.7 and 3.8 as being required. If you run into issues with 3.7 in future contributions, just assert True/accept failure if ver <=3.9