python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
799 stars 111 forks source link

Implement a disambiguator for discriminated unions #392

Closed A5rocks closed 11 months ago

A5rocks commented 1 year ago

Another way at "tagged unions" except backwards compatible! Also, a ton of services provide data in this fashion.

Look at the tests for some examples!

A5rocks commented 1 year ago

Hi @Tinche, just pinging in case you missed this!

Tinche commented 1 year ago

Hi there! So here are some of my thoughts.

I used to apply this pattern consistently at my $DAYJOB since, as you say, it's simple, powerful and widely used. Although my classes had defaults for the tag field so they could be instantiated directly, not just through cattrs.

But the main issue is it goes against some fundamental cattrs design principles. It has two major problems.

The first one is that you need to have the tag as a Literal field on the classes. In Python, this field is useless: the tag is the actual class. So it just sits there using memory and cpu cycles in __init__ et al. If you think about it, it should really be a ClassVar, and then if you think about it some more, that information is already present in the class object.

The second issue is it puts un/structuring data on the model itself. cattrs is fundamentally an exercize in decoupling serialization logic from models.

The tagged union strategy introduced in the last release solves both of these issues. It was the result of learnings I had using this Literal approach.

That said, maybe there's a way forward here. The one place where cattrs has historically strayed from its design principles was in the default union disambiguator. I'm good with going for convenience here.

So I would be open to you combining this work with the default disambiguator. Currently, the default disambiguator looks for unique field names; how about we add a pass before where it would look for Literal ClassVars?

A5rocks commented 1 year ago

Overall I think this makes sense! I do agree this does kinda suck but the reason I made this PR is cause I'm autogenerating a discriminated enum and it's quite annoying having to autogenerate cattrs converters too (even if I think I finally figured it out).

Just one question:

So I would be open to you combining this work with the default disambiguator. Currently, the default disambiguator looks for unique field names; how about we add a pass before where it would look for Literal ClassVars?

I'm not sure what this entails. I've placed this next to where the other disambiguator is used; would you rather I rename the other one and then stick this in that one? Provide a nicer way to signalling "this doesn't work" instead of raising exceptions and catching it? I just need more details!

Tinche commented 1 year ago

Yeah, I was thinking you remove the disambiguator you created, and put the logic from it into the default disambiguator.

So the default disambiguator first looks for literal fields, and then does the unique field logic on the rest of the classes.

Tinche commented 1 year ago

Howdy, how are we looking here? Trying to plan the next release.

A5rocks commented 1 year ago

Sorry, I think this works as you mentioned.

Tinche commented 1 year ago

Just a small note, I'll be away for the next 3 weeks on a family trip and I can continue reviewing this after I'm back!

Tinche commented 11 months ago

Alright, I'm back so let's get this over the line and release the next version!

Looks like it needs a rebase and a linting error fixed.

A5rocks commented 11 months ago

I'm going to assume the linting errors are fixed because they're in files this PR doesn't touch xd

(I'll see whenever the workflows actually get approved...)

Tinche commented 11 months ago

I dropped 3.7 support on main, that would explain these failures ;)

A5rocks commented 11 months ago

I'm a bit confused as to why isort and ruff are complaining about files I didn't modify :(

I've gone ahead and fixed the obvious isort issue but I'm not sure what else it doesn't like -- I'll try running it locally in a bit :^)

Oh. Ruff complains on main branch and I forgot I added isort to run in CI.

... I believe isort is complaining because the environment is different (it doesn't complain locally). I'll remove it; I don't have the CI expertise for this!

A5rocks commented 11 months ago

Well, after a lot of problems with linters it looks like that saga's over :D

Tinche commented 11 months ago

Thanks, I'll merge this in now.

I might tweak the docs a little and work on a way to disable this by default; there's a change this might break someone and they need to have an escape hatch to the old behavior, but since I think the chance is low the escape hatch doesn't need to be super simple.

What do you think if we also change it to allow literal fields with defaults?

A5rocks commented 11 months ago

I think as long as there's only one type with a default for a literal (or missing literal) then that's fine. Or if it's fine for the disambiguation function to raise if it's given a dictionary without that literal key.

Tinche commented 11 months ago

What's the use case that's concerning you?

I think we can be more lax with literals since defining a field as field_x: Literal['x'] = 'x' is very uncommon, I think you'd only do it for serialization.

A5rocks commented 11 months ago

Hm, I'm mostly concerned about deserializing an object missing that literal key.

Tinche commented 11 months ago

But this strategy is about serialization, not deserialization ;)

I want to understand so I don't mess up your use cases, since you're the original contributor. Can you sketch a small example for me?

A5rocks commented 11 months ago

Errrr, sorry I messed up my words. The counter example I'm thinking is:

import attrs
from typing import Literal
import cattrs

@attrs.define()
class A:
  x: Literal[4] = 4

@attrs.define()
class B:
  x: Literal[5] = 5

cattrs.structure({"x": 4}, A | B)  # this might still work
cattrs.structure({}, A | B)  # but what about this?

... I'm probably misunderstanding something. (I'm not sure what is "serialization" and what is "deserialization" with cattrs...)