python / typing

Python static typing home. Hosts the documentation and a user help forum.
https://typing.readthedocs.io/
Other
1.6k stars 237 forks source link

Dynamic and Discriminated Unions #1467

Open irgolic opened 1 year ago

irgolic commented 1 year ago

When modeling semi-structured data, discriminated unions are an invaluable tool in your arsenal. It allows you to model a variable unit of data, where each unit concisely specifies its type and payload. This is a common pattern in event-driven architectures, where you have a stream of "events", "commands", or "actions", each of which is a discriminated union of event types.

I'd like to propose two new features to the typing module that formalize dynamic and discriminated unions.

1️⃣ Dynamic Unions

Current state

Assume we have a Status model that is a discriminated union of StatusA and StatusB:

from typing import Literal, Union
from pydantic import BaseModel

class StatusBase(BaseModel):
    status: str

class StatusA(StatusBase):
    status: Literal['a']

class StatusB(StatusBase):
    status: Literal['b']

To define a union that is understood by the type-checker at static-analysis time, you declare it by mentioning every member of the union:

Status = Union[StatusA, StatusB]
reveal_type(Status)  # Type of "Status" is "type[StatusA] | type[StatusB]"
print(Status)  # typing.Union[__main__.StatusA, __main__.StatusB]

Whenever adding a new subclass of StatusBase, this declaration must be updated.

I often abuse a pattern of lazily declaring a dynamic union instead of maintaining a static one, by tying all the members of the union to the subclasses of a parent class (in this case StatusBase):

Status = Union[tuple(StatusBase.__subclasses__())]
reveal_type(Status)  # Type of "Status" is "Unknown"
print(Status)  # typing.Union[__main__.StatusA, __main__.StatusB]

The type-checker is not able to infer the type of Status, because it's only compiled at runtime.

Proposed feature

I propose a new feature that would allow the type-checker to infer a dynamic union of all StatusBase subclasses, and finally error if more subclasses are declared afterward.

from typing import DynamicUnion  # Currently not a thing

Status = DynamicUnion[StatusBase]
reveal_type(Status)  # Type of "Status" is "type[StatusA] | type[StatusB]"
print(Status)  # typing.Union[__main__.StatusA, __main__.StatusB]

(Maybe call it SubclassUnion instead of DynamicUnion?)

The type-checker should detect if a subclass of StatusBase is declared after Status, and raise an error.

Status = DynamicUnion[StatusBase]

class StatusC(StatusBase):  # Error: Subclass of StatusBase declared after declaration of DynamicUnion[StatusBase]
    status: Literal['c']

This could work like the @sealed decorator was proposed to work in the Sealed Typing PEP draft. More useful information found on this email chain.

The @sealed decorator was designed as part of PEP 622 (structural pattern matching). Think of it as confining a type and its subclasses to a single module, so that the type-checker can dynamically infer a union of the subclasses. These are also known as algebraic data types.

I'd like to also explore the possibility of scoping algebraic data types to a package, or a dynamic scope bound between the parent class declaration and the union declaration.

2️⃣ Discriminated unions

Current state

For lack of a formalization of discriminated unions in core python, data (de)serialization libraries turn to their own objects. For example, pydantic specifies a discriminator value with Field, and uses Annotated to tag the union:

from typing import Annotated
from pydantic import Field

AnnotatedStatus = Annotated[
    Status,
    Field(discriminator='status')
]

This does not raise an error at runtime if any member of the Status union does not have a status field UNTIL AnnotatedStatus is used as a type on the field of a pydantic model declaration.

Annotated is an "escape hatch" that allows clients to express things that type-checkers might not understand.

However, there is a push by third-party libraries for objects in the typing module that would canonically be used with the Annotated type. It signals a willingness to move from redundant proprietary internals toward ubiquitous core python. See PEP 727 and related discussion.

Proposed feature

I propose a new feature that would allow the type-checker to infer the type of AnnotatedStatus, by declaring it as a dynamic discriminated union:

from typing import DiscriminatedUnion  # Currently not a thing

AnnotatedStatus = DiscriminatedUnion[Status, 'status']
reveal_type(AnnotatedStatus)  # Type of "AnnotatedStatus" is "type[StatusA] | type[StatusB]"
print(AnnotatedStatus)  # typing.Union[__main__.StatusA, __main__.StatusB]

This would grant data (de)serialization libraries unified plumbing to declare discriminated unions, and would allow the type-checker to throw an error at static-analysis time if any member of the Status union does not have a status field, or if any member of the Status union has a status field with a value that is not unique among all members of the union.

The latter hinges on type-checkers being able to implicitly detect disjoint unions. See pyright#5933 for more information on mutable field invariance and implicit disjoint unions.

Motivation

When I'm writing JSONSchemas or deserializing in event-driven architectures, and I'm iterating fast, I gravitate toward easily being able to add new members to a union.

I usually do this in one of two ways. Either:

While this approach feels more magical than maintaining a static union yourself, it facilitates an accessible plug-in architecture, where users can implement new subclasses of the parent class in a new file, and have it automatically become part of the union without deeply understanding the internals of the library.

erictraut commented 1 year ago

This concept was discussed at some length in this thread.

irgolic commented 1 year ago

Thanks for the link @erictraut! A few thoughts from reading that thread:

The @sealed decorator was faced with backlash

It seems like the consensus in the thread was that @sealed should not exist in its proposed fashion, namely because there's no precedent for restricting type behavior to a module.

The decorator was inspired by a couple of different languages:

Scala > Scala has [sealed types](https://docs.scala-lang.org/tour/pattern-matching.html#sealed-types). > ```scala > sealed trait Device > case class Phone(model: String) extends Device: > def screenOff = "Turning screen off" > > case class Computer(model: String) extends Device: > def screenSaverOn = "Turning screen saver on..." > > > def goIdle(device: Device): String = device match > case p: Phone => p.screenOff > case c: Computer => c.screenSaverOn > ``` > > It throws an error if extending the type from another file, > > ```scala > case class Telepathy(message: String) extends Notification > ^ > Cannot extend sealed trait Notification in a different source file > ``` > and ALSO throws an error on non-exhaustive pattern matching of the type (like match's `assert_never` trick). > ```scala > def showNotification(notification: Notification): String = > notification match > case Email(sender, title, _) => > s"You got an email from $sender with title: $title" > case SMS(number, message) => > s"You got an SMS from $number! Message: $message" > ``` > ```scala > match may not be exhaustive. > > It would fail on pattern case: VoiceRecording(_, _) > ```
Kotlin > Kotlin has [sealed classes and interfaces](https://kotlinlang.org/docs/sealed-classes.html). > > ```kotlin > sealed interface Error > > sealed class IOError(): Error > > class FileReadError(val file: File): IOError() > class DatabaseError(val source: DataSource): IOError() > > object RuntimeError : Error > ``` > > The key difference between Scala sealed types and Kotlin sealed classes is that **Kotlin allows the subclasses to be scoped to a PACKAGE**, not just a file. > > They describe the key benefit of using these as removing the need for an `else` clause in their `when` expression: > ```kotlin > fun log(e: Error) = when(e) { > is FileReadError -> { println("Error while reading file ${e.file}") } > is DatabaseError -> { println("Error while reading from database ${e.source}") } > is RuntimeError -> { println("Runtime error") } > // the `else` clause is not required because all the cases are covered > } > ``` > > Though Kotlin doesn't have unions, sealed classes are the closest thing they have to them. See [Union types discussion](https://discuss.kotlinlang.org/t/union-types/77/103)

Non-exhaustiveness match check is already supported

There's a neat trick that supports writing a pattern match over a union, and making sure it's exhaustive at static analysis time:

match status:
    case StatusA(s=status):
        ...
    case StatusB(s=status):
        ...
    case _ as x:
        assert_never(x)

If a member gets added to the union and is not handled in the match block, a type error is raised.

This is great, my only concern here is that it's an obscure "trick". Not sure about python's philosophy behind the match statement, but perhaps we could do something similar to Kotlin, and infer the assert_never default case if matching on a DynamicUnion?

TypeScript parity

For completeness, see also what TypeScript does:

TypeScript > You can define a [discriminated union of interfaces](https://www.typescriptlang.org/docs/handbook/2/narrowing.html#discriminated-unions). > > ```typescript > interface Circle { > kind: "circle"; > radius: number; > } > > interface Square { > kind: "square"; > sideLength: number; > } > > type Shape = Circle | Square; > ``` > > The exhaustiveness check happens similarly to python's `assert_never`: > > ```typescript > function getArea(shape: Shape) { > switch (shape.kind) { > case "circle": > return Math.PI * shape.radius ** 2; > case "square": > return shape.sideLength ** 2; > default: > const _exhaustiveCheck: never = shape; > return _exhaustiveCheck; > } > } > ``` > > However, TypeScript also allows you to [define discriminated unions more functionally](https://www.typescriptlang.org/docs/handbook/typescript-in-5-minutes-func.html#discriminated-unions): > > ```typescript > type Shape = > | { kind: "circle"; radius: number } > | { kind: "square"; x: number } > | { kind: "triangle"; x: number; y: number }; > ``` > > This looks clean and elegant, but I don't think we can use it as inspiration for a base class/union solution. It does highlight a need for easier syntax in defining unions.

It seems like manually declaring a union with all its members and using assert_never for exhaustive matching is pretty good already, and TypeScript supports a similar system.

Where python lacks parity with TypeScript is an easier syntax for writing discriminated unions. Herein lies one of the core differences between python and TypeScript – the latter has anonymous types as first-class citizens.

type Shape =
  | { kind: "circle"; radius: number }
  | { kind: "square"; x: number }
  | { kind: "triangle"; x: number; y: number };

This syntax makes sense in the context of TypeScript, but not in the current form of python. Depending on how inline syntax for TypedDict resolves, this may become viable.

However, I believe DynamicUnion and DiscriminatedUnion are more pythonic; they would simplify, formalize, and unify already established patterns in the python ecosystem.

Let me know what you think, or if you believe we should move this discussion to another platform like the mailing list 🙏 Thanks

erictraut commented 1 year ago

Pyright already supports an option called reportMatchNotExhaustive. When this option is enabled, pyright reports non-exhaustive match statements without the need for an assert_never. Mypy hasn't yet implemented such an option, but it is an open feature request.

# pyright: reportMatchNotExhaustive=true

from typing import Literal
from pydantic import BaseModel

class StatusBase(BaseModel):
    status: str

class StatusA(StatusBase):
    status: Literal["a"]

class StatusB(StatusBase):
    status: Literal["b"]

class StatusC(StatusBase):
    status: Literal["c"]

Status = StatusA | StatusB | StatusC

def func(status: Status) -> None:
    match status: # Pyright error: Cases within match statement do not handle all values; Unhandled type: "StatusC"
        case StatusA():
            ...
        case StatusB():
            ...

Pyright also supports experimental support for inline syntax for TypedDict. I added this support to inform the discussion about this feature. I recommend not taking a dependency on it because it's likely to change or be removed depending on the final resolution of this topic, but you can play with it to get a sense for what this approach might feel like. Here's what your example would look like with inlined TypedDict support and PEP 695 syntax.

# pyright: enableExperimentalFeatures=true

from typing import Literal

type Shape = (
  dict[{ "kind": Literal["circle"], "radius": float }] |
  dict[{ "kind": Literal["square"], "x": float }] |
  dict[{ "kind": Literal["triangle"], "x": float, "y": float }]
)
JelleZijlstra commented 1 year ago

Non-exhaustiveness match check is already supported as of python3.11

Just for completeness, there's really nothing new in Python 3.11 here. We added typing.assert_never in 3.11, but (a) you can use it on any other supported Python version with typing_extensions and (b) it's not treated specially by type checkers, you can write def assert_never(x: NoReturn) -> NoReturn: in your own code and it will work the same.

What we did in 3.11 (put this in typing) does help address your point that this is an "obscure trick". Hopefully now that it is in typing it will be more discoverable.

irgolic commented 1 year ago

Sorry for the delay in my reply :)

Thanks @JelleZijlstra , I thought match was necessary for the trick but now I'm realizing you could probably achieve the same with an if/elif chain. Great job on getting assert_never in, it's definitely nicer to have standardized syntax for this over rolling your own NoReturn sentinel.

And thanks @erictraut for reportMatchNonExhaustive and the experimental inline TypedDict unions. Was the type keyword preceding Shape on line 5 a typo or intended? Interested to see how people adopt it, and how it complements unions of objects. Similar to protocols, just for properties, if I understand correctly?

I thought to document python/TypeScript parity regarding this feature because it felt sensible given the comparison to Scala and Kotlin. However, my use case extends beyond TypeScript-style TypedDict unions.

The same with a metaclass

Hoping to better illustrate my use case by highlighting an alternate solution...

I was taught that you should really only be using a metaclass if you're:

Is there a canonical metaclass implementation that registers class declarations of a base class? Any time I did it, it looked something like this:

classes = {}

class MyMetaclass(type):
    def __new__(meta, name, bases, attrs):
        cls = super().__new__(meta, name, bases, attrs)
        if name != "MyBaseClass":
            classes[name] = cls
        return cls

class MyBaseClass(metaclass=MyMetaclass):
    pass

class A(MyBaseClass):
    pass

class B(MyBaseClass):
    pass

print(classes)  # {'A': <class '__main__.A'>, 'B': <class '__main__.B'>}

As described in the OP motivation, and as implemented in Kotlin (see dropdown), this paradigm allows for declaring subclasses in separate files.

And the subclasses need not only be dataclasses, they can fill abstract methods too. It’s an easily accessible way to add new functionality in a plugin-based architecture, without

Otherwise…

Is the only suggested pattern for fulfilling this use case to manually define the Union?

If there's no way to statically detect a union of a dynamic collection of objects, neither as a union of subclasses, nor as registered by a metaclass, could there be a way to raise an error at static-analysis time if a subclass is declared, but not written down as part of a union's definition? Ideally I would also like to have the ability to raise an error if a subclass is declared outside of a specified file or package, but given how the @sealed decorator flopped, I'm not sure that would be accepted.

erictraut commented 1 year ago

Is the only suggested pattern for fulfilling this use case to manually define the Union?

That's the pattern I recommend. I use it extensively in TypeScript and Python, and I find that it works great, especially when paired with exhaustion verification techniques.

Any mechanism that involves dynamic registration is going to be a problem for static type analysis, especially if those registrations can be performed anywhere by any code. This is why, for example, ABC.register is not supported by any Python type checkers.

irgolic commented 1 year ago

Alright, thanks for the confirmation.

Regarding the DiscriminatedUnion proposal, given that PEP 727 got added to typing_extensions, would it make sense to add something similar for annotating a union type as discriminated? It would allow type-checkers to throw an error on the line of declaration instead of raising an exception at run-time if the type is used as part of a model.
Though I'm not a maintainer of a data serialization library, really it would be nice to hear from @tiangolo or @samuelcolvin if they think it would be useful.

Unrelated sidenote regarding inlined typed dicts > ```python > # pyright: enableExperimentalFeatures=true > > from typing import Literal > > type Shape = ( > dict[{ "kind": Literal["circle"], "radius": float }] | > dict[{ "kind": Literal["square"], "x": float }] | > dict[{ "kind": Literal["triangle"], "x": float, "y": float }] > ) > ``` > > This will ultimately encourage errors similar to those inferred by [pylyzer](https://github.com/mtshiba/pylyzer) to come to light. > > ![image](https://raw.githubusercontent.com/mtshiba/pylyzer/main/images/analysis.png) > > Are you planning to add similar inference capabilities to pyright?
samuelcolvin commented 1 year ago

1️⃣ Dynamic Unions

I'm not sure your proposal makes much sense to me, surely you should just make StatusBase an ABC or a protocol - no need for a union.

2️⃣ Discriminated unions

:+1: :+1: :+1: :+1:

Yes we would love this, I haven't (yet) read through the whole of this issue, let me know if you need me to read it thoroughly, and or respond to specific points.

I actually asked for it at the typing summit at pycon, in that case it was actually to help mypy's performance - see https://github.com/python/mypy/issues/14034. But it would also be very helpful in pydantic itself.

A few unordered thoughts:

JelleZijlstra commented 1 year ago

The policy of typing-extensions is to add new objects when there is a PEP for them. We'll add DiscriminatedUnion if and when there is such a PEP.

I haven't thought about DiscriminatedUnion much recently, but I saw Samuel's talk at PyCon and as I recall it wasn't clear to me how this feature would affect type checker behavior, since a sufficiently strong type checker could already discriminate unions this way, even without the annotations. If a PEP is going to propose DiscriminatedUnion, it will have to be clear about how this new construct is different from Union from a type checker's perspective.

adriangb commented 1 year ago

I agree with both @samuelcolvin and @JelleZijlstra: I think Pydantic would benefit from a shared Discriminator marker for Annotated that works 3ith other libraries, but (1) Pydantic supports much more complex things than type checkers would (callables, the AliasPath stuff) and (2) like Jelle said it's unclear what benefit this would have for type checkers aside from performance (and in the end mypy solved the issue without this; I'm not a type checker author so I can't say if it would help but they all seem to think it won't so I'll have to go with that).

So overall I don't see a huge benefit to adding anything to typing{_extensions} or annotated-types.

irgolic commented 1 year ago

DiscriminatedUnion mostly made sense to me in the context of this proposal (including DynamicUnion). If there's no interest for DynamicUnion, the Discriminator marker type to be used with Annotated seems more elegant.

From a type-checking perspective, I'd just want for there to be an error raised upon declaring the type marked as discriminated.

Pass > ```python > class A(BaseModel): > prop: str > > class B(BaseModel): > prop: str > > MyUnion = Union[A, B] > > MyDiscriminatedUnion = Annotated[MyUnion, Discriminator("prop")] > ```
Fail > ```python > class A(BaseModel): > prop: str > > class B(BaseModel): > prop: str > > class C(BaseModel): > not_prop: str > > MyUnion = Union[A, B, C] > > MyDiscriminatedUnion = Annotated[MyUnion, Discriminator("prop")] # static analysis error raised here > ```

Not sure what the precedent is on static analysis over marker types. Sounds like a great avenue for community plugins.

By the way, why did doc go into typing_extensions and not into annotated-types?

samuelcolvin commented 1 year ago

Your examples would fail at runtime with Pydantic I believe, if that helps.

By the way, why did doc go into typing_extensions and not into annotated-types?

Because it had a PEP.

samskiter commented 1 year ago

Just to throw some more cents in here (and while we're comparing to other languages). I come from Swift programming where 'enum' acts as a really powerful discriminated union. It's the number 1 feature I miss from Swift as it allows you to be super expressive and type safe with return types, data etc etc.

In Swift a discriminated union is done with an 'enum with associated value'. I think it's quite an intuitive way to represent the feature as we're already familiar with the idea of an Enum having a limited number of options and 'switching' across them (I see a bunch of thoughts about about using match in Python which I think is perfect for this concept...).

Example:

enum Barcode {
    case upc(Int, Int, Int, Int)
    case qrCode(String)
}
var productBarcode = Barcode.upc(8, 85909, 51226, 3)

switch productBarcode {
case .upc(let numberSystem, let manufacturer, let product, let check): // here the .upc union is matched and it's inner values are assigned out to the 4 new variables
    print("UPC: \(numberSystem), \(manufacturer), \(product), \(check).")
case .qrCode(let productCode):
    print("QR code: \(productCode).")
}

Multiple values in a case are like having a tuple in python:

Barcode = tuple[int, int, int, int] | String

What's interesting is the ability to mix a 'valueless' enum with those with a value:

enum BlogPostStatus {
    case draft
    case published(Date)
    case removed(String)
}

...the ability to have the same type discriminated:

enum SomeStringRequestResponse {
    case success(String)
    case failure(String)
}

.... the abiility to name the values for better readability:

enum Pizza {

  case small(inches: Int) // without 'inches' here, you might be left guessing what 'int' means
}

We could improve Barcode readabilty like this:

enum Barcode {
    case upc(numberSystem: Int, manufacturer: Int, product: Int, check: Int)
    case qrCode(String)
}

And to do more complex matching in a switch statement using the 'where' clause:

func handle(_ error: Error) {
    switch error {
    // Matching against a group of offline-related errors:
    case URLError.notConnectedToInternet,
         URLError.networkConnectionLost,
         URLError.cannotLoadFromNetwork:
        showOfflineView()
    // Matching against a specific error:
    case let error as HTTPError where error == .unauthorized:
        logOut()
    // Matching against our networking error type:
    case is HTTPError:
        showNetworkErrorView()
    // Fallback for other kinds of errors:
    default:
        showGenericErrorView(for: error)
    }
}

Not to mention all the usual niceties of encapsulation that an Enum provides (utility functions etc):

enum CustomTextFieldTypes {

  case cardType
  case cardNumber
  case cardExpiryDate
  case cardName
  case ccvNumber

  func inputCardNumber(cardNumber: String!, cardNumberTextField: XCUIElement?) {
    if case .cardNumber = self {
        cardNumberTextField?.typeText(cardNumber)
    }
  }
}

These are all 1st class language features and have made using other languages feel a little jarring/limiting for me and others I know - either indicating the Swift enum has become a crutch or something very useful... ;)

I'm not knowledgeable or involved enough with the Python language to think about writing a PEP or contributing to implementation or understand the nuances of how proposals interact with other language features, but as a 'user' this thread makes me excited for the direction of the language and I wanted to show the Swift perspective to hopefully act as inspiration/conversation :)

irgolic commented 10 months ago

How Documentation Metadata in Typing resolves will likely inform the resolution for this issue