lidatong / dataclasses-json

Easily serialize Data Classes to and from JSON
MIT License
1.34k stars 151 forks source link

[BUG] Type coercion since 0.5.9 #466

Closed jfsmith-at-coveo closed 10 months ago

jfsmith-at-coveo commented 11 months ago

Description

I'll let you guys explain that this is intended and why, but starting with 0.5.9 there is type coercion going on when using the from_dict() method.

A non-zero int becomes True on a bool field, for instance. An int becomes its stringified version on a string field.

This feels to me like a breaking change, but I don't see this anywhere in the release notes. Nor in the version number in any case.

Was this intended? If so, why?

Code snippet that reproduces the issue

from dataclasses import dataclass
from dataclasses_json import DataClassJsonMixin

@dataclass
class MyDataclass(DataClassJsonMixin):
    this_must_be_bool: bool
    this_must_be_string: str

d = {"this_must_be_bool": 1234, "this_must_be_string": 1234}
c = MyDataclass.from_dict(d)

assert isinstance(c.this_must_be_bool, bool)
assert isinstance(c.this_must_be_string, str)

Describe the results you expected

So this snippet raises with 0.5.8. Anything after and including 0.5.9 does not raise.

Python version you are using

3.11.4

Environment description

Tested with dataclasses-json 0.5.9, 0.5.12 and 0.5.14

george-zubrienko commented 11 months ago

This is not even type coercion, this is defo a regression because 1234 is not a boolean value. I'm looking into this.

george-zubrienko commented 11 months ago

This is the culprit in core.py lines 247-250

    elif _issubclass_safe(field_type, (int, float, str, bool)):
        res = (field_value
               if isinstance(field_value, field_type)
               else field_type(field_value))

I'll check where this was introduced. This PR: https://github.com/lidatong/dataclasses-json/pull/375

george-zubrienko commented 11 months ago

Now to sum up:

USSX-Hares commented 11 months ago

@george-zubrienko in v1 all these coercions should be a feature flag. It may still be the default behavior btw.

matt035343 commented 11 months ago

Not sure minor bump is needed, since I would categorize it as a patch

matt035343 commented 10 months ago

@USSX-Hares I agree that it should be a feature flag. @george-zubrienko please add to task list.

I changed my mind, I recommend bumping to 0.6.0.

jfsmith-at-coveo commented 10 months ago

Thanks everyone for the quick feedback! I still have some thoughts on this issue:

  • The 42: str to 42: int is a valid auto-conversion in many frameworks and DCJ should not be an exception, thus this should not raise, so [...] Sames goes for floats etc.

I do not use Python frameworks on a daily basis and I do not know any that would do that to strings and numerals. I think this is a bad idea, for three reasons:

1) As python is a strongly typed language, no Python developer should ever expect a type change at runtime.

2) Type annotations are annotations. I think they should stay that way. It feels wrong to have them suddenly begin to have any operational value at runtime.

3) Regardless of the above (those are just my two cents, you guys do your stuff and it's fine), it still is an unexpected change of behavior between DCJ versions. Whatever you guys decided to do with booleans, I strongly believe the other types have to be treated the same way on that basis alone.

Thanks again all!

george-zubrienko commented 10 months ago

As python is a strongly typed language, no Python developer should ever expect a type change at runtime.

Python also has dynamic typing and is an interpreted language, the very existence of Union type contradicts this statement. Type coercion is a common thing where JSON/YAML are used in real life, esp when dealing with configurations, so this is in fact a common practice. If you use app configurations (appsettings.json, HOCON etc), this is daily life, unless you go strict mode for everything. In HOCON you can implicitly convert stuff like 2s to timespan of 2 seconds and this is actually an awesome feature. If you want a strict typecheck on JSON read, this is possible to do, but I agree this should be a feature flag and v0 of DCJ doesn't offer that, but I am working on v1 PR that will enable that.

Type annotations are annotations. I think they should stay that way. It feels wrong to have them suddenly begin to have any operational value at runtime.

This statement is the opposite of the first one. If we treat everything as recommendation, then we should have coercions as default - remember JSON is a text format and doesn't carry schema information like AVRO or PARQUET. In fact, it is a quite simple thing, how target types should be treated, but you seem to be more inclined towards strict compliance, which is one way to look at this, but not the only way ;)

it still is an unexpected change of behavior between DCJ versions

I'd argue with this. As stated in the reasoning from the original author https://github.com/lidatong/dataclasses-json/pull/375, since DCJ has marshmellow interop, quote

This differs from marshmallow decode behavior which properly converts to the integer value.

we should behave the way marshmellow users expect, and lack of type coercion was a bug, that turned out to be a feature for some. Thus, I agree that for some users - definitely, for others - long awaited fix that doesn't warrant a minor bump.

However, even the boolean change I've opened a PR for can cause even more issues to be opened, where people either:

Thus, I will keep this issue opened and ref it to #442 where we will make both crowds happy by moving this behaviour to a feature flag, so everybody can have their own little DCJ doing the right thing. Until then, help welcome/stay tuned :)

jfsmith-at-coveo commented 10 months ago

Python also has dynamic typing and is an interpreted language, the very existence of Union type contradicts this statement.

I don't want this to sidetrack into unrelated discussions and all, but I should point out that you are confusing dynamic/static with weak/strong typing. You should know that python is both dynamic AND strongly typed. Names that refer to objects are not bound to any type and may be reassigned freely. That's the dynamic part, and that's what Union is for. Union is not a type for the object it annotates, i.e. it's not your object that is "Union", it's the annotation itself. And that tells the type checker that the name X may refer to an object of either of A or B (or more) type. At runtime X will be strictly typed, and it will be A or B, and Union will have no existence. That's all what this means. Type is not dynamic. References are.

That's the context of my intervention here. Like I said, you proceed as you see fit, but I do believe that treating python dataclasses as JSON/YAML-like objects (by default or not) is, though convenient, definitely not pythonic. :sweat_smile:

george-zubrienko commented 10 months ago

I don't want this to sidetrack into unrelated discussions and all

That is exactly what is happening here. If you are unhappy with the response to the issue and you think some changes are in order, that is one thing and can be discussed, but so far I do not see any response to the reasons outlined in my response, including the marshmellow interop alignment, which was the main reason for this change in the first place. We have to take into account the broader audience here :)

In case you create an issue and put a bug label on it, we must investigate and decide on the best course of action, which includes you as the issue author, taking responsibility and helping us in this journey - for example push for minor version bump - do you think this is needed, or not? The but I do believe that treating python dataclasses as JSON/YAML-like objects (by default or not) is, though convenient, definitely not pythonic does not really help to decide the course of action and frankfully just adds clutter to the issue thread, we could have gone with a Discussion instead then ;)

You should know that python is both dynamic AND strongly typed

I never said the opposite :) Important part to note that we are dealing with text format serde in a language where you effectively have very little control of the actual type you are getting after deserialization, the magic "at runtime strictly typed" X.a, if you seen that part of DCJ source code, is actually decided by the type hint on a dataclass field, because there is no other way to get that information. This can lead to issues, like the ones we see with booleans. That being said, please focus on the original problem and what do you think of the proposed solution above. I'd like to read that part instead.

jfsmith-at-coveo commented 10 months ago

Those are fair points. My apologies. I do acknowledge there is a broader context here with DCJ, which I tend to forget. I mainly use from_dict(), with which the stakes are simpler. I have nothing else to say about the solution, in the sense that in the end I believe it is the developers' call. Bumping to 0.6.0 is fine! :smiley:

george-zubrienko commented 10 months ago

Bumping to 0.6.0 is fine

Good, I tend to think we should have probably done this too, which @matt035343 agreed as well. But given that the damage has already been done, what do you think is the best course of action? Should we just bump the next release, will that be an acceptable solution?

jfsmith-at-coveo commented 10 months ago

Bumping to a new version and then documenting seems to me the best thing to do in this case. In the release notes for 0.6.0, just specify that there's a fix for a regression introduced in 0.5.9, and then explain what is the intended behavior from this version onward.

george-zubrienko commented 10 months ago

Alright, then this is what we'll do. I think we'll have some commits to release by end of this week, so expect 0.6.0 around Sunday :)

george-zubrienko commented 10 months ago

Opened an issue for v1 - when we get there, would be nice if you have time to participate in the discussion :) Thanks a lot for bringing this up!

george-zubrienko commented 10 months ago

Small update, moving this to next week as we might have another commit which can introduce some funky behaviours, so I'm pushing 0.6.0 to next release some time next week

george-zubrienko commented 10 months ago

@jfsmith-at-coveo v0.6.0 released with a note on type coercion and a few other things. I'm closing this issue as resolved and promise this will be further addressed in v1 API when it comes out.