python / typing-council

Decisions by the Python Typing Council
43 stars 3 forks source link

Typing spec update for enumerations #11

Closed erictraut closed 5 months ago

erictraut commented 10 months ago

I'd like to request that the TC consider adoption of a new chapter in the typing spec that spells out type checking behaviors related to enum classes.

Links to PR & Discussion The PR can be found here. The latest draft incorporates feedback from PR reviews and the discussion.

The discussion can be found here.

TC Sign-off

Current Type Checker Behaviors The proposed chapter specifies several required behaviors for type checkers along with some optional behaviors.

The latest published version of pyright (1.1.347) has about eight areas where it fails to conform with the required behaviors. Version 1.1.348 of pyright will address all of these.

The latest published version of mypy (1.8.0) mostly conforms to the proposed spec, but it doesn't yet implement support for enum.member and enum.nonmember, which were introduced in Python 3.11. It also doesn't enforce naming consistency when the functional form of Enum is used. And it doesn't enforce the type of the _value_ attribute in all cases if _value_ has an annotated type.

I haven't done a detailed analysis of pyre or pytype, but I assume that they largely conform with the required parts of the proposed spec.

Controversial Issues There is one potentially controversial issue raised in the proposed spec. It has to do with the way enum members and non-member attributes are delineated in a type stub. Historically, there has been no unambiguous way to differentiate between enum members and non-member attributes within a type stub.

Typeshed stubs (and presumably stubs outside of typeshed) are inconsistent in how they define enums.

class E1(Enum):
    a = 1  # This is clearly intended to be a member

class E2(Enum):
    a: int  # This is ambiguous; is it a member of a non-member attribute?

class E3(Enum):
    a = 1
    b: int  # ???

The proposed wording clears up this ambiguity and indicates that stubs should follow the same rules as non-stub files for differentiating between enum members and non-member attributes, but it allows the member value to be replaced with a ....

Fixing all of typeshed's stubs to conform with this updated spec will be relatively easy. I was able to do it in about 10 minutes, and the resulting changes are backward compatible with existing type checker implementations. However, once type checkers are updated to conform to the proposed spec, they will interpret older stubs differently than they do today. This could result in some temporary pain for users.

I don't have a good solution to this backward compatibility issue other than to recommend that type checkers may want to hold off on that part of the change while type stubs are updated.

JelleZijlstra commented 10 months ago

To get some more confidence on the compatibility issue, could we first change typeshed to align it with the proposed spec? That is, make a PR for the change you are alluding to, and ensure it passes typeshed's CI and can be merged without major disruption. If that exposes any problems in existing type checkers, we should consider whether to change the spec. If everything is fine, we can accept this proposal.

erictraut commented 10 months ago

Yes, that's a good suggestion. I did a variation of that experiment in the pyright repo, where I modified pyright's vendored copy of typeshed. I can use that as the basis for an official typeshed PR.

erictraut commented 10 months ago

@JelleZijlstra, here's the typeshed PR. I'm hitting what appears to be a bug in stubtest where it's not able to deal with one of the more complex enum classes. The mypy_primer results are clean, which is encouraging.

hauntsaninja commented 10 months ago

Yeah, looks like stubtest doesn't handle that correctly, logic I had for enums was simplistic. I filed https://github.com/python/mypy/issues/16806 , I recommend adding ssl.Purpose to the stubtest allowlist for now.

JelleZijlstra commented 10 months ago

I guess I noticed a different error in the stubtest output, submitted python/mypy#16807 to fix it.

JelleZijlstra commented 10 months ago

I would like to hold off on accepting this section into the spec until we're able to merge python/typeshed#11299. I think it's important for typeshed to be compliant with the spec and it looks like that's not quite possible yet.

erictraut commented 7 months ago

This chapter has been stuck in limbo for quite some time. I think the holdup is mainly about one section. I wonder if I should just remove that section and try to get the remainder approved. We could then go back and try to add the remaining section at a later time.

If the main blocker is mypy's stubtest functionality, is that going to be addressed in the soon-to-be-released mypy 1.10 release? Has it already been fixed?

@JelleZijlstra, any thoughts or advice?

hauntsaninja commented 7 months ago

If we're referring to https://github.com/python/mypy/pull/16807, that should already be released in 1.9. If there's something else, let me know — while I don't do mypy releases in general, I'm happy to make point releases with cherry picks to unblock typeshed / council work

gvanrossum commented 7 months ago

For me personally, the main blockers are (a) time, and (b) my personal dissatisfaction with the enum module -- I find its semantics too convoluted and too reliant on metaclasses.

JelleZijlstra commented 7 months ago

We're now almost ready to merge the typeshed change.

The backward compatibility issue gives me pause, though. It's good that we're able to update typeshed to conform with the proposed spec, but I'm sure there are a lot of other existing stubs written under the assumption that x: int creates an enum member. To recapitulate, the problem is with code like this:

import enum

class Sig(enum.Enum):
    ABRT: int

reveal_type(Sig.ABRT)  # mypy: Literal[Sig.ABRT]; pyright: int

With this spec change, we're blessing pyright's behavior and telling mypy to make a change.

@erictraut has pyright always had this behavior, or was it a recent change? And have you received any bug reports complaining about this behavior? (A cursory search on the pyright issue tracker didn't turn up any.) If there haven't been too many complaints, that's an indication that this will be a relatively easy change for mypy to make.

A few of the behaviors proposed here are rather subtle (e.g. the _value_ trick). It would be good to add a section in the typing user guide describing those. I share @gvanrossum's feeling that the enum module is too complicated and has too many features; unfortunately that's not something we can address here.

gvanrossum commented 7 months ago
reveal_type(Sig.ABRT)  # mypy: Literal[Sig.ABRT]; pyright: int

At runtime it's a NameError, and list(Sig) doesn't include such attributes. So I'd say the mypy answer is definitely wrong, right? So why would anyone rely on mypy's answer?

JelleZijlstra commented 7 months ago

Usually it'd be written this way in a stub. For example, typeshed's signal.pyi currently has:

class Signals(IntEnum):
    SIGABRT: int

However, both mypy and pyright currently agree that Signals.SIGABRT is an enum member. Maybe pyright special-cases stubs right now?

erictraut commented 7 months ago

Has pyright always had this behavior, or was it a recent change?

Pyright has had this behavior for about three years now. As you guessed, pyright acts this way only for ".py" files. For stub files, its mirrors mypy's behavior. This was necessary because of the way typeshed defines enums. If we stick with mypy's behavior, we have no way of properly typing enums that include both members and non-member attributes. I guess we could say "we don't care about those cases", but I'm uncomfortable taking that stance because I've encountered many cases in real-world code that would be affected by this limitation.

Pyright doesn't currently implement the _value_ trick. That was an idea I came up with when writing this chapter. I'm OK dropping that concept if we think it's too subtle or unnecessary.

I agree that the enum module is too complicated and has too many features. More of these features are added every release, and I'm surprised each time about them — always after the fact. It feels like there's some missing process here. Who is adding these? Shouldn't these changes be going through some open review/spec process?

I understand your concerns about backward compatibility. As with all of the typing spec clarifications, my goal is to work toward a good long-term solution even if that means a small amount of short-term pain. The proposal under review is my best attempt to come up with such a solution. I'm open to other suggestions if you can think of a way that we can land in a good place long term and avoid short-term compatibility issues.

hauntsaninja commented 7 months ago

I think I'm in support of breaking backward compatibility here. Things should fine in non-stubs, which limits how bad things can get.

I think member = cast(type, ...) could be a better alternative to the _value_ trick? If so, it would deal with the heterogeneous members case @AlexWaygood was concerned about.

For stubs, typeshed should be a big chunk of usage. We should update stubgen to output the new enums. It's unlikely for extension module stubs to use enums (maybe other than some PyO3 use case). It would be useful for us to know what the most downloaded non-typeshed stubs packages on PyPI are, and we can work with them to see if they're affected. I scrolled through a few pages of code search of stubs containing enum and many of them use plain assignment.

AlexWaygood commented 7 months ago

If we stick with mypy's behavior, we have no way of properly typing enums that include both members and non-member attributes.

The way we've always done this in typeshed in the past has been to "pretend" that the non-member attributes are properties, e.g. https://github.com/python/typeshed/blob/7c8e82fe483a40ec4cb0a2505cfdb0f3e7cc81d9/stdlib/http/__init__.pyi#L14. This isn't fully accurate w.r.t. how enums work at runtime, but as far as I know we've never received any issues or informal complaints about the way we've always done this up to now. I still struggle to understand the motivation for these changes to how stubs are meant to work, given the backwards-compatibility implications, and the fact that it would mean heterogeneous enums would be inexpressible in stubs in some instances. ISTM we'd just be trading one kind of lack of expressiveness for another, which doesn't seem to me to be worth the churn.

hauntsaninja commented 7 months ago

I've updated stubgen in https://github.com/python/mypy/pull/17125 , like the typeshed PR, it seems reasonable to do either way.

@AlexWaygood what did you think of my member = cast(type, ...) suggestion for the case where stubs can't inline the value? Maybe we don't have to trade lack of expressiveness.

(re motivation: it does appear to me that choosing "we don't care about the distinction between members and non-members" would be sad but probably viable, since from a quick skim I was only able to find two mypy issues. But the proposed change does match the behaviour at runtime more closely and I'd still like to be able to express those things)

AlexWaygood commented 7 months ago

The cast() idea is okay; I could get behind that.


Alternative idea, slightly more blue-sky: Python 3.11 added the enum.member and enum.nonmember decorators. I wonder whether we could repurpose those so that they also work as special forms in annotations, e.g.

class Foo(Enum):
    x: member[int]  # explicitly declare an enum member
    y: nonmember[str]  # explicitly declare an enum non-member

We'd be special-casing a symbol that comes from outside the typing module for use in annotations, but there's precedent for that in e.g. dataclasses.InitVar. It seems to me like it would be pretty elegant, and would be fully expressive. We could backport the symbols in typing_extensions. And it would be backwards-compatible with the current ways of doing things, as we wouldn't disallow the existing way of defining enums that most stubs currently use; we'd just be adding a new, more expressive way.

hauntsaninja commented 6 months ago

I signed off.

As mentioned above, I think the backwards compatibility break is quite feasible here. I merged my stubgen PR, so both typeshed and stubgen should be updated for it.

The _value_ trick for stubs is a little subtle and I'd personally be happy to drop it. I'd prefer to use member = cast(type, ...), since a) it lets you express more things, b) it should work "for free" (and I would advocate we do this if it comes up in typeshed)

carljm commented 6 months ago

I read the chapter. Left a few comments/questions, but I'm OK with its inclusion as-is.

I feel like the _value_ "trick" should also come for free: enum instances do have a _value_ attribute, type checkers already have to understand that the value assigned a member is assigned to that instance attribute, and _value_: int is the normal way to specify the type of an instance attribute. I would rather read (or write) a stub with _value_: int than one that uses RED = cast(int, ...) on every member definition.

hauntsaninja commented 6 months ago

(I agree the _value_ trick is a trick that makes sense. What I meant by "for free" was that neither users nor type checkers need to know anything more about enum to get the semantics Alex wanted to be able to express. I believe at the original time of posting, neither myself nor mypy nor pyright understood _value_ enough)

electric-coder commented 3 months ago

I agree that the enum module is too complicated and has too many features.

I disagree, the enum module is excellent and comes with a minimal feature set corresponding to the most commonly requested use cases on Stack Overflow. It's sound and if anything the typing tools have never caught up with even the most basic functionalities of the module.

Reading the spec that's being defined here I get a feeling it would have been adequate back in 2017 but in 2024 it's just lagging behind of most real world uses enum has.

electric-coder commented 3 months ago
reveal_type(Sig.ABRT)  # mypy: Literal[Sig.ABRT]; pyright: int

pyright is wrong. It should obviously be:

reveal_type(Sig.ABRT)  # mypy: Literal[Sig.ABRT]
reveal_type(Sig.ABRT.value)  # mypy: int

Enum members are themselves types and instances of the defining class, so conflating the type of the value with the type of the member is by definition nonsense.

hauntsaninja commented 3 months ago

The example you're quoting is specifically about the behaviour in stubs. At runtime there will always be values associated with enum members.

I recommend reading the content of the spec and looking at the test cases in the conformance test suite. If after doing so you still have concerns you'd like to raise, I recommend doing so at discuss.python.org — this issue isn't really the right location to do so.