Open irgolic opened 1 year ago
This concept was discussed at some length in this thread.
Thanks for the link @erictraut! A few thoughts from reading that thread:
@sealed
decorator was faced with backlashIt seems like the consensus in the thread was that @sealed
should not exist in its proposed fashion, namely because there's no precedent for restricting type behavior to a module.
The decorator was inspired by a couple of different languages:
match
check is already supportedThere's a neat trick that supports writing a pattern match over a union, and making sure it's exhaustive at static analysis time:
match status:
case StatusA(s=status):
...
case StatusB(s=status):
...
case _ as x:
assert_never(x)
If a member gets added to the union and is not handled in the match
block, a type error is raised.
This is great, my only concern here is that it's an obscure "trick". Not sure about python's philosophy behind the match
statement, but perhaps we could do something similar to Kotlin, and infer the assert_never
default case if matching on a DynamicUnion?
For completeness, see also what TypeScript does:
It seems like manually declaring a union with all its members and using assert_never
for exhaustive matching is pretty good already, and TypeScript supports a similar system.
Where python lacks parity with TypeScript is an easier syntax for writing discriminated unions. Herein lies one of the core differences between python and TypeScript – the latter has anonymous types as first-class citizens.
type Shape =
| { kind: "circle"; radius: number }
| { kind: "square"; x: number }
| { kind: "triangle"; x: number; y: number };
This syntax makes sense in the context of TypeScript, but not in the current form of python. Depending on how inline syntax for TypedDict resolves, this may become viable.
However, I believe DynamicUnion and DiscriminatedUnion are more pythonic; they would simplify, formalize, and unify already established patterns in the python ecosystem.
Let me know what you think, or if you believe we should move this discussion to another platform like the mailing list 🙏 Thanks
Pyright already supports an option called reportMatchNotExhaustive
. When this option is enabled, pyright reports non-exhaustive match
statements without the need for an assert_never
. Mypy hasn't yet implemented such an option, but it is an open feature request.
# pyright: reportMatchNotExhaustive=true
from typing import Literal
from pydantic import BaseModel
class StatusBase(BaseModel):
status: str
class StatusA(StatusBase):
status: Literal["a"]
class StatusB(StatusBase):
status: Literal["b"]
class StatusC(StatusBase):
status: Literal["c"]
Status = StatusA | StatusB | StatusC
def func(status: Status) -> None:
match status: # Pyright error: Cases within match statement do not handle all values; Unhandled type: "StatusC"
case StatusA():
...
case StatusB():
...
Pyright also supports experimental support for inline syntax for TypedDict. I added this support to inform the discussion about this feature. I recommend not taking a dependency on it because it's likely to change or be removed depending on the final resolution of this topic, but you can play with it to get a sense for what this approach might feel like. Here's what your example would look like with inlined TypedDict support and PEP 695 syntax.
# pyright: enableExperimentalFeatures=true
from typing import Literal
type Shape = (
dict[{ "kind": Literal["circle"], "radius": float }] |
dict[{ "kind": Literal["square"], "x": float }] |
dict[{ "kind": Literal["triangle"], "x": float, "y": float }]
)
Non-exhaustiveness match check is already supported as of python3.11
Just for completeness, there's really nothing new in Python 3.11 here. We added typing.assert_never
in 3.11, but (a) you can use it on any other supported Python version with typing_extensions
and (b) it's not treated specially by type checkers, you can write def assert_never(x: NoReturn) -> NoReturn:
in your own code and it will work the same.
What we did in 3.11 (put this in typing
) does help address your point that this is an "obscure trick". Hopefully now that it is in typing it will be more discoverable.
Sorry for the delay in my reply :)
Thanks @JelleZijlstra , I thought match
was necessary for the trick but now I'm realizing you could probably achieve the same with an if
/elif
chain. Great job on getting assert_never
in, it's definitely nicer to have standardized syntax for this over rolling your own NoReturn
sentinel.
And thanks @erictraut for reportMatchNonExhaustive
and the experimental inline TypedDict unions. Was the type
keyword preceding Shape
on line 5 a typo or intended? Interested to see how people adopt it, and how it complements unions of objects. Similar to protocols, just for properties, if I understand correctly?
I thought to document python/TypeScript parity regarding this feature because it felt sensible given the comparison to Scala and Kotlin. However, my use case extends beyond TypeScript-style TypedDict unions.
Hoping to better illustrate my use case by highlighting an alternate solution...
I was taught that you should really only be using a metaclass if you're:
Is there a canonical metaclass implementation that registers class declarations of a base class? Any time I did it, it looked something like this:
classes = {}
class MyMetaclass(type):
def __new__(meta, name, bases, attrs):
cls = super().__new__(meta, name, bases, attrs)
if name != "MyBaseClass":
classes[name] = cls
return cls
class MyBaseClass(metaclass=MyMetaclass):
pass
class A(MyBaseClass):
pass
class B(MyBaseClass):
pass
print(classes) # {'A': <class '__main__.A'>, 'B': <class '__main__.B'>}
As described in the OP motivation, and as implemented in Kotlin (see dropdown), this paradigm allows for declaring subclasses in separate files.
And the subclasses need not only be dataclasses, they can fill abstract methods too. It’s an easily accessible way to add new functionality in a plugin-based architecture, without
Is the only suggested pattern for fulfilling this use case to manually define the Union?
If there's no way to statically detect a union of a dynamic collection of objects, neither as a union of subclasses, nor as registered by a metaclass, could there be a way to raise an error at static-analysis time if a subclass is declared, but not written down as part of a union's definition? Ideally I would also like to have the ability to raise an error if a subclass is declared outside of a specified file or package, but given how the @sealed
decorator flopped, I'm not sure that would be accepted.
Is the only suggested pattern for fulfilling this use case to manually define the Union?
That's the pattern I recommend. I use it extensively in TypeScript and Python, and I find that it works great, especially when paired with exhaustion verification techniques.
Any mechanism that involves dynamic registration is going to be a problem for static type analysis, especially if those registrations can be performed anywhere by any code. This is why, for example, ABC.register
is not supported by any Python type checkers.
Alright, thanks for the confirmation.
Regarding the DiscriminatedUnion proposal, given that PEP 727 got added to typing_extensions
, would it make sense to add something similar for annotating a union type as discriminated? It would allow type-checkers to throw an error on the line of declaration instead of raising an exception at run-time if the type is used as part of a model.
Though I'm not a maintainer of a data serialization library, really it would be nice to hear from @tiangolo or @samuelcolvin if they think it would be useful.
I'm not sure your proposal makes much sense to me, surely you should just make StatusBase
an ABC or a protocol - no need for a union.
:+1: :+1: :+1: :+1:
Yes we would love this, I haven't (yet) read through the whole of this issue, let me know if you need me to read it thoroughly, and or respond to specific points.
I actually asked for it at the typing summit at pycon, in that case it was actually to help mypy's performance - see https://github.com/python/mypy/issues/14034. But it would also be very helpful in pydantic itself.
A few unordered thoughts:
Discriminator
marker type, to be used as in foobar: Annotated[Union[Foo, Bar], Discriminator('x')]
, see https://github.com/pydantic/pydantic/issues/7462Discriminator
to be able to take: a function, a tuple/list (indicating the path to the value to use as the discriminator), or AliasPath
/ AliasChoice
which do the same thing more explicitly - obviously this won't be any help to type checkers, I guess in this case they would just need to fallback to the normal union logicDiscriminatedUnion
to typing-extensions yet, I would guess @adriangb could be persuaded to add Discriminator
to annotated-types - this might be the best first step?The policy of typing-extensions is to add new objects when there is a PEP for them. We'll add DiscriminatedUnion
if and when there is such a PEP.
I haven't thought about DiscriminatedUnion much recently, but I saw Samuel's talk at PyCon and as I recall it wasn't clear to me how this feature would affect type checker behavior, since a sufficiently strong type checker could already discriminate unions this way, even without the annotations. If a PEP is going to propose DiscriminatedUnion
, it will have to be clear about how this new construct is different from Union
from a type checker's perspective.
I agree with both @samuelcolvin and @JelleZijlstra: I think Pydantic would benefit from a shared Discriminator
marker for Annotated that works 3ith other libraries, but (1) Pydantic supports much more complex things than type checkers would (callables, the AliasPath stuff) and (2) like Jelle said it's unclear what benefit this would have for type checkers aside from performance (and in the end mypy solved the issue without this; I'm not a type checker author so I can't say if it would help but they all seem to think it won't so I'll have to go with that).
So overall I don't see a huge benefit to adding anything to typing{_extensions} or annotated-types.
DiscriminatedUnion
mostly made sense to me in the context of this proposal (including DynamicUnion
). If there's no interest for DynamicUnion
, the Discriminator
marker type to be used with Annotated
seems more elegant.
From a type-checking perspective, I'd just want for there to be an error raised upon declaring the type marked as discriminated.
Not sure what the precedent is on static analysis over marker types. Sounds like a great avenue for community plugins.
By the way, why did doc
go into typing_extensions and not into annotated-types?
Your examples would fail at runtime with Pydantic I believe, if that helps.
By the way, why did doc go into typing_extensions and not into annotated-types?
Because it had a PEP.
Just to throw some more cents in here (and while we're comparing to other languages). I come from Swift programming where 'enum' acts as a really powerful discriminated union. It's the number 1 feature I miss from Swift as it allows you to be super expressive and type safe with return types, data etc etc.
In Swift a discriminated union is done with an 'enum with associated value'. I think it's quite an intuitive way to represent the feature as we're already familiar with the idea of an Enum having a limited number of options and 'switching' across them (I see a bunch of thoughts about about using match
in Python which I think is perfect for this concept...).
Example:
enum Barcode {
case upc(Int, Int, Int, Int)
case qrCode(String)
}
var productBarcode = Barcode.upc(8, 85909, 51226, 3)
switch productBarcode {
case .upc(let numberSystem, let manufacturer, let product, let check): // here the .upc union is matched and it's inner values are assigned out to the 4 new variables
print("UPC: \(numberSystem), \(manufacturer), \(product), \(check).")
case .qrCode(let productCode):
print("QR code: \(productCode).")
}
Multiple values in a case are like having a tuple in python:
Barcode = tuple[int, int, int, int] | String
What's interesting is the ability to mix a 'valueless' enum with those with a value:
enum BlogPostStatus {
case draft
case published(Date)
case removed(String)
}
...the ability to have the same type discriminated:
enum SomeStringRequestResponse {
case success(String)
case failure(String)
}
.... the abiility to name the values for better readability:
enum Pizza {
case small(inches: Int) // without 'inches' here, you might be left guessing what 'int' means
}
We could improve Barcode
readabilty like this:
enum Barcode {
case upc(numberSystem: Int, manufacturer: Int, product: Int, check: Int)
case qrCode(String)
}
And to do more complex matching in a switch statement using the 'where' clause:
func handle(_ error: Error) {
switch error {
// Matching against a group of offline-related errors:
case URLError.notConnectedToInternet,
URLError.networkConnectionLost,
URLError.cannotLoadFromNetwork:
showOfflineView()
// Matching against a specific error:
case let error as HTTPError where error == .unauthorized:
logOut()
// Matching against our networking error type:
case is HTTPError:
showNetworkErrorView()
// Fallback for other kinds of errors:
default:
showGenericErrorView(for: error)
}
}
Not to mention all the usual niceties of encapsulation that an Enum provides (utility functions etc):
enum CustomTextFieldTypes {
case cardType
case cardNumber
case cardExpiryDate
case cardName
case ccvNumber
func inputCardNumber(cardNumber: String!, cardNumberTextField: XCUIElement?) {
if case .cardNumber = self {
cardNumberTextField?.typeText(cardNumber)
}
}
}
These are all 1st class language features and have made using other languages feel a little jarring/limiting for me and others I know - either indicating the Swift enum has become a crutch or something very useful... ;)
I'm not knowledgeable or involved enough with the Python language to think about writing a PEP or contributing to implementation or understand the nuances of how proposals interact with other language features, but as a 'user' this thread makes me excited for the direction of the language and I wanted to show the Swift perspective to hopefully act as inspiration/conversation :)
How Documentation Metadata in Typing resolves will likely inform the resolution for this issue
When modeling semi-structured data, discriminated unions are an invaluable tool in your arsenal. It allows you to model a variable unit of data, where each unit concisely specifies its type and payload. This is a common pattern in event-driven architectures, where you have a stream of "events", "commands", or "actions", each of which is a discriminated union of event types.
I'd like to propose two new features to the typing module that formalize dynamic and discriminated unions.
1️⃣ Dynamic Unions
Current state
Assume we have a
Status
model that is a discriminated union ofStatusA
andStatusB
:To define a union that is understood by the type-checker at static-analysis time, you declare it by mentioning every member of the union:
Whenever adding a new subclass of
StatusBase
, this declaration must be updated.I often abuse a pattern of lazily declaring a dynamic union instead of maintaining a static one, by tying all the members of the union to the subclasses of a parent class (in this case
StatusBase
):The type-checker is not able to infer the type of
Status
, because it's only compiled at runtime.Proposed feature
I propose a new feature that would allow the type-checker to infer a dynamic union of all
StatusBase
subclasses, and finally error if more subclasses are declared afterward.(Maybe call it
SubclassUnion
instead ofDynamicUnion
?)The type-checker should detect if a subclass of
StatusBase
is declared afterStatus
, and raise an error.This could work like the
@sealed
decorator was proposed to work in the Sealed Typing PEP draft. More useful information found on this email chain.The
@sealed
decorator was designed as part of PEP 622 (structural pattern matching). Think of it as confining a type and its subclasses to a single module, so that the type-checker can dynamically infer a union of the subclasses. These are also known as algebraic data types.I'd like to also explore the possibility of scoping algebraic data types to a package, or a dynamic scope bound between the parent class declaration and the union declaration.
2️⃣ Discriminated unions
Current state
For lack of a formalization of discriminated unions in core python, data (de)serialization libraries turn to their own objects. For example, pydantic specifies a discriminator value with
Field
, and usesAnnotated
to tag the union:This does not raise an error at runtime if any member of the
Status
union does not have astatus
field UNTILAnnotatedStatus
is used as a type on the field of a pydantic model declaration.Annotated
is an "escape hatch" that allows clients to express things that type-checkers might not understand.However, there is a push by third-party libraries for objects in the
typing
module that would canonically be used with theAnnotated
type. It signals a willingness to move from redundant proprietary internals toward ubiquitous core python. See PEP 727 and related discussion.Proposed feature
I propose a new feature that would allow the type-checker to infer the type of
AnnotatedStatus
, by declaring it as a dynamic discriminated union:This would grant data (de)serialization libraries unified plumbing to declare discriminated unions, and would allow the type-checker to throw an error at static-analysis time if any member of the
Status
union does not have astatus
field, or if any member of theStatus
union has astatus
field with a value that is not unique among all members of the union.The latter hinges on type-checkers being able to implicitly detect disjoint unions. See pyright#5933 for more information on mutable field invariance and implicit disjoint unions.
Motivation
When I'm writing JSONSchemas or deserializing in event-driven architectures, and I'm iterating fast, I gravitate toward easily being able to add new members to a union.
I usually do this in one of two ways. Either:
base.py
, each of the subclasses in a separate file, and the union in the module’s__init__.py
While this approach feels more magical than maintaining a static union yourself, it facilitates an accessible plug-in architecture, where users can implement new subclasses of the parent class in a new file, and have it automatically become part of the union without deeply understanding the internals of the library.