lidatong / dataclasses-json

Easily serialize Data Classes to and from JSON
MIT License
1.34k stars 151 forks source link

Improve dataclass_json and _process_class type annotations #475

Closed ringohoffman closed 10 months ago

ringohoffman commented 10 months ago

Use typing.overload to differentiate the return type of dataclass_json based on whether _cls parameter is None or not.

This is motivated by fixing type issues in flytekit, which does not only use dataclass_json as a decorator.

from dataclasses import dataclass

from dataclasses_json.api import dataclass_json

@dataclass
class Point:
    x: int
    y: int

directly_wrapped = dataclass_json(Point)

reveal_type(directly_wrapped)
# Before:
# pyright: Type of "directly_wrapped" is "((cls: Any) -> Any) | Any"
# mypy: Revealed type is "Any"
# PyCharm: directly_wrapped: (cls: {to_json, from_json, to_dict, from_dict, schema, __init__, dataclass_json_config}) -> {to_json, from_json, to_dict, from_dict, schema, __init__, dataclass_json_config} | {to_json, from_json, to_dict, from_dict, schema, __init__, dataclass_json_config} = dataclass_json(Point)

# After:
# pyright: Type of "dataclass_json(Point)" is "type[Point]"
# mypy: Revealed type is "Type[a.Point]"
# PyCharm: directly_wrapped: Type[Point] = dataclass_json(Point)

indirectly_wrapped = dataclass_json()(Point)

reveal_type(indirectly_wrapped)
# Before:
# pyright: Type of "indirectly_wrapped" is "Any"
# mypy: Revealed type is "Any"
# PyCharm: indirectly_wrapped: {to_json, from_json, to_dict, from_dict, schema, __init__, dataclass_json_config} = dataclass_json()(Point)

# After:
# pyright: Type of "dataclass_json()(Point)" is "type[Point]"
# mypy: Revealed type is "Type[a.Point]"
# PyCharm: indirectly_wrapped: Type[Point] = dataclass_json()(Point)

####################################################################################################

@dataclass_json
@dataclass
class Point2:
    x: int
    y: int

reveal_type(Point2)
# Before:
# pyright: Type of "Point2" is "type[Point2]"
# mypy: Revealed type is "def (x: builtins.int, y: builtins.int) -> a.Point2"
# PyCharm: class Point2

# After:
# pyright: Type of "Point2" is "type[Point2]"
# mypy: Revealed type is "def (x: builtins.int, y: builtins.int) -> a.Point2"
# PyCharm: class Point2

####################################################################################################

@dataclass_json()
@dataclass
class Point3:
    x: int
    y: int

reveal_type(Point3)
# Before:
# pyright: Type of "Point3" is "type[Point3]"
# mypy: Revealed type is "def (x: builtins.int, y: builtins.int) -> a.Point3"
# PyCharm: class Point3

# After:
# pyright: Type of "Point3" is "type[Point3]"
# mypy: Revealed type is "def (x: builtins.int, y: builtins.int) -> a.Point3"
# PyCharm: class Point3
george-zubrienko commented 10 months ago

Hi @ringohoffman! If you can provide additional information on how this change interacts with:

This would significantly speed up the review process here, otherwise I'll have to try all these myself before I can review. Community had issues with this codepiece some time ago, so I have to be extra careful here. Also a sidenote, @overload functionality is not a very stable thing to rely on for such change, but let's put that aside and get basic checks done first.

ringohoffman commented 10 months ago

What I had already posted were the changes you can see in VS Code using pylance/pyright. Pylance is a Visual Studio Code extension that uses pyright at its core for static type checking. I've gone ahead and added the before and after for PyCharm and mypy to the description as well.

adding a simple test step that runs pylint/pyright would be fine

What do you mean by this? Do you want me to add a GitHub Workflow step to run pyright and pylint?

When I run pyright on the dataclasses_json module, there are 25 pre-existing errors. This PR doesn't add to or resolve these errors. I don't think that adding a workflow step for this is feasible at this time. Moreover, I think the poor type-checking compatibility of the dataclass_json decorator would completely preclude us from enforcing no pyright errors on this codebase.

I don't see any typecheck warnings/errors when I run pylint on the code in my description, before or after. This PR does resolve a pylint warning for the unused LetterCase import in dataclasses_json/api.py since we are using it in the type hints now. Given that pylint didn't catch anything related to these changes, I see adding pylint to the workflow for this repository as a separate issue, but one that I would support.

Let me know how you want to move forward based on this and my updated description.

george-zubrienko commented 10 months ago

I think it's ok, however I'm curious if this works with IDE autocomplete? You snippets are runtime code, but autocomplete does not rely on reveal_type etc. Though people mainly complain about static type checkers, we had cases when they also want IDE autocomplete to work like it does when using a Mixin.

Other small question, we plan to deprecate this function in v1 API (coming soon). How does that affect you?

ringohoffman commented 10 months ago

I think it's ok, however I'm curious if this works with IDE autocomplete? You snippets are runtime code, but autocomplete does not rely on reveal_type etc. Though people mainly complain about static type checkers, we had cases when they also want IDE autocomplete to work like it does when using a Mixin.

Other small question, we plan to deprecate this function in v1 API (coming soon). How does that affect you?

I don't think it's possible to make dataclass_json compatible with auto-complete. I think you have to choose the return type as either the wrapped type or as DataClassJsonMixin, but either way static type checkers will lose access to part of the actually available runtime methods. I asked the author of pyright how he would handle making this function static-typing friendly, and he had recommended to just use the mixin as a mixin instead of the decorator.

This doesn't impact me in any major way. I already migrated flytekit to use DataClassJsonMixin as a mixin instead of the dataclass_json decorator: https://github.com/flyteorg/flytekit/pull/1801 They only use dataclass_json in that one place now: https://github.com/flyteorg/flytekit/blob/0a772f3621ce9bd1a3e3b2f1e691458f25a31876/flytekit/core/type_engine.py#L1549

george-zubrienko commented 10 months ago

This doesn't impact me in any major way. I already migrated flytekit to use DataClassJsonMixin

Good, though v1 deprecates both the Mixin and the annotation - check discussion in #442. Note that we'll release v1 alongside v0 and deprecation of v0 will take very long time so please do not be scared :) But it helps a lot for us to understand the use cases like the code snippet you referenced.

Thanks a lot for contribution, we have a bit of an issue with 3.7 rn, but I plan to go around it and release your commit in 0.6 in a few days.