python / cpython

The Python programming language
https://www.python.org
Other
62.84k stars 30.09k forks source link

Specialize access to Enum attributes #95004

Open markshannon opened 2 years ago

markshannon commented 2 years ago

Currently we don't specialize loading of enum attributes.

class Color(Enum):
     RED = "Red"

Color.RED   # This load is not specialized

We can take advantage of the fact that type(Color.RED) == Color when specializing. Since we check the version number of Color in LOAD_ATTR_CLASS, we are implicitly checking the version of type(Color.RED). If Color.RED is not a descriptor when specialized, it will not be a descriptor when executed. Therefore, we can specialize it in the same way as immutable, non-descriptor class attributes using LOAD_ATTR_CLASS.

@Fidget-Spinner @brandtbucher

Fidget-Spinner commented 2 years ago

Are you working on this?

markshannon commented 2 years ago

Are you working on this?

No. If you want to do this, just assign yourself. It might be worth waiting for 3.11 to calm down a bit first.

brandtbucher commented 2 years ago

I imagine that this will piggyback off of https://github.com/python/cpython/pull/93988, since EnumType defines __getattr__?

ethanfurman commented 2 years ago

__getattr__ is no longer necessary, and will be removed in 3.12 (since I forgot to do it in 3.11).

ethanfurman commented 2 years ago

If Color.RED is not a descriptor when specialized, it will not be a descriptor when executed. Therefore, we can specialize it in the same way as immutable, non-descriptor class attributes using LOAD_ATTR_CLASS.

Even if Color.__dict__['RED'] is a descriptor, its result will still always be the member Color.RED when called on the class. Is that optimizable?

In case it matters, the exact descriptor would be enum.property -- the big difference between it and the standard property is the behavior when called on the class -- either the matching enum member, or an AttributeError.

brandtbucher commented 2 years ago

Even if Color.__dict__['RED'] is a descriptor, its result will still always be the member Color.RED when called on the class. Is that optimizable?

Interesting. Looking at it now, I'm not sure why LOAD_ATTR_CLASS currently only handles methods and non-descriptors. As you note, it should be able to handle anything (at least anything without a descriptor in the metaclass, which we already guard against), including descriptors in the class __dict__.

@Fidget-Spinner?

Fidget-Spinner commented 2 years ago

@brandtbucher it's probably an oversight in how I implemented it. I was being conservative at the time (also I admit just using the direct results from the analyze_descriptor function for simplicity).

Fidget-Spinner commented 2 years ago

Some complications with specialising enums:

We probably need a specialisation for metaclasses, like what Brandt suggested in his other PR.

markshannon commented 2 years ago

Inheriting from enums uses metaclasses. Which we (just) disabled.

All classes have a metaclass. The issue is whether the metaclass is mutable, has overrides, or shadows class attributes. This is much the same as for normal instances and their classes, which is why we have analyze_descriptor

The enum metaclass is a mutable class.

This is the only problem here, AFAICT. The easy fix is to make EnumType immutable. It doesn't need to be mutable. A more general, and much more complex fix is to update the version number of all classes that are instances of a metaclass, when that metaclass changes.

The enum property is a descriptor.

Properties are descriptors. Which property are you talking about?

Fidget-Spinner commented 2 years ago

Properties are descriptors. Which property are you talking about?

This part is wrong. I initially assumed enum members defined __get__ and __set__. I don't think they do.

brandtbucher commented 2 years ago

This part is wrong. I initially assumed enum members defined __get__ and __set__. I don't think they do.

No, I think it's just name and value that do that. Still something to consider, though.

brandtbucher commented 2 years ago

I was being conservative at the time (also I admit just using the direct results from the analyze_descriptor function for simplicity).

We're probably going to want to revisit this soon. Looking at it with fresh eyes, things like the GETSET_OVERRIDDEN checks are incorrect in this context too.

This is the only problem here, AFAICT. The easy fix is to make EnumType immutable. It doesn't need to be mutable. A more general, and much more complex fix is to update the version number of all classes that are instances of a metaclass, when that metaclass changes.

It seems to me that a much simpler solution would just be to version the metaclass too (https://github.com/python/cpython/compare/main...brandtbucher:cpython:load-attr-metaclass).

markshannon commented 2 years ago

It seems to me that a much simpler solution would just be to version the metaclass too

Simpler, but slower. It is probably the best thing to do for now, we can always do the more complex thing later.

ethanfurman commented 2 years ago

The enum metaclass is a mutable class.

This is the only problem here, AFAICT. The easy fix is to make EnumType immutable. It doesn't need to be mutable. A more general, and much more complex fix is to update the version number of all classes that are instances of a metaclass, when that metaclass changes.

Is there anything I can do to make EnumType immutable?

brandtbucher commented 2 years ago

As far as I know, the only way is to set the right flags on the type at the C level. So, short of reimplementing EnumType in C, there probably isn't a simple way to do it without hacks (like exposing a C function somewhere that you can decorate your type with).

brandtbucher commented 2 years ago

So let's just version the metaclass for now. As soon as EnumType removes its __getattr__ function (and we iron out some of the quirks of class attribute specialization), we'll be able to specialize Enum (and many others, like sympy) attribute accesses without resorting to hacks.

I'll open a PR in a bit.

ethanfurman commented 2 years ago

This part is wrong. I initially assumed enum members defined __get__ and __set__. I don't think they do.

No, I think it's just name and value that do that. Still something to consider, though.

In EnumClass.__dict__ the member names, along with value and name and any other enum.propertys assigned by the user, are descriptors. The members themselves are not.

brandtbucher commented 2 years ago

Ah, thanks.

Also, I was wrong earlier: I totally forgot that descriptor __get__ methods still need to be called (without an instance) on class attribute accesses too. So it's not safe to cache any descriptors here, as I previously thought. :(

ethanfurman commented 2 years ago

In EnumClass.__dict__ the member names ... are descriptors.

At least they will be once the performance is improved (which I believe it is even now in 3.12) -- in 3.11 they are only descriptors if the enum class defines a member which matches a property in the inheritance chain:

class ConfusingEnumBase(Enum):
    @enum.property
    def attribute(self):
        return 'example'

class ConfusingEnum(ConfusingEnumBase):
    cls = 1
    decorator = 2
    attribute = 3

In the above, the cls and decorator members live directly in ConfusingEnum.__dict__, while attribute is a descriptor in ConfusingEnum.__dict__.

ethanfurman commented 2 years ago

Also, I was wrong earlier: I totally forgot that descriptor __get__ methods still need to be called (without an instance) on class attribute accesses too. So it's not safe to cache any descriptors here, as I previously thought. :(

Even if the result of calling __get__ on the class never changes? Is there a C hack to mark such things as immutable so it only needs to be called once and then the result cached?

brandtbucher commented 2 years ago

Even if the result of calling __get__ on the class never changes? Is there a C hack to mark such things as immutable so it only needs to be called once and then the result cached?

We don't cache the results of function calls anywhere, since in general that's an extremely fragile semantic break. We do things like caching the descriptors themselves, but not the result of calling any Python code. It also puts us in a really nasty situation that we've so far avoided: storing strong references in the inline caches.

Realistically, the best we can probably do without adding caches is caching the descriptor and inlining the __get__ call. But I'm not sure if that will make enough difference to matter (and we still won't have __getattr__ handling until something like #93988 is in).

warsaw commented 2 years ago

I'm not in an environment where I can research this so I'll just ask: is the enum specialization being discussed here limited only to stdlib enums? I ask because I've been thinking about resurrecting my flufl.enum package (a precursor to the stdlib library). I'd promote it as a limited, simplified enum library. It might not get much use and I'm not sure whether it would even benefit from enum-specific specialization, but I'd like to know if this is a general specification or something that only benefits the stdlib package (not that that's a bad thing).

Fidget-Spinner commented 2 years ago

is the enum specialization being discussed here limited only to stdlib enums

Most likely not. The specializations work on things with the same "shape", "type" or belonging in the same "family" (behavior-wise). So anything specializing stdlib enums would also work for anything that has the same characteristics (ie, mutable metaclass, attribute belonging to a class).

If your enum implementation is simpler, we likely already specialize for it. If you want to check the specialization works, run your enums in a loopy function then pass that to dis.dis(f, adaptive=True) in 3.12. If you see weird LOAD_ATTR opcodes (other than LOAD_ATTR_ADAPTIVE. It means it worked.

gryznar commented 1 year ago

Is it currently possible in a satisfying way to make EnumType immutable? Unfortunately, in the current situation, implementing mixins which are inherited from mutable objects has no sense and implementing immutability on the user's side is very hard (if at all possible).

ethanfurman commented 1 year ago

The above comment is based on this Stackoverflow question. As I mention there, the immutability of EnumType has no bearing on enum members being mutable (and it's the mutability of the members that is the problem).