Closed ethanfurman closed 11 years ago
PEP-0435 has been approved!
Now for lots of code review.
Can you upload the patch as a diff against the CPython repo?
Here it is (I hope ;) .
I don't see docs in the patch, but perhaps that should be a separate patch to keep reviewing easier.
Also, Ethan, number the patch files in some way (like pep-435.1.patch, pep-435.N.patch) as they go through rounds of reviews.
OK, I sent another batch of reviews through the code review system - Ethan you should've gotten an email about it.
Regarding module vs. modulename for the extra param in the functional API, Guido rightly points out in bpo-17941 that \_module__ is the class attribute, so "module" is a consistent choice.
Incorporated comments.
More adjustments due to review.
Round 4. I wonder if I should have used two digits to number the patches. ;)
By the way, if anyone is willing to take on the documentation, that could be of great help. It shouldn't be very hard since PEP-435 is extremely detailed. The stdlib .rst doc would be the descriptions from PEP-435 plus a reference of the exposed classes which is pretty minimal.
We'll need the documentation patch available by the time the code is done reviewing.
Two requests:
Lose the frame hack. With the explicit "module=name" API available, the implicit fragile magic isn't necessary and leads to Enum instances that may or may not support pickling depending on the implementation. Better to say "if you use the functional API without passing the module name, your Enum instances won't support pickling".
Mike Bayer (SQL Alchemy author) has requested the ability to derive a custom metaclass that turns off the Enums-with-members-are-final subclassing restriction (relaxing the rule that members of an enum are instances of that enum to "members of an enum are instances of that enum, or one of its parent enums", but otherwise keeping the same behaviour as the standard enums).
Barry would probably appreciate this capability, too ;)
We don't need to explicitly support this for the initial patch (besides, as I read the current code, you can get something like this already just by overriding EnumMeta._get_mixins), but we may want to revisit it as a supported subclassing API at some point in the future.
On Fri, May 10, 2013 at 8:09 PM, Nick Coghlan \report@bugs.python.org\wrote:
Nick Coghlan added the comment:
Two requests:
- Lose the frame hack. With the explicit "module=name" API available, the implicit fragile magic isn't necessary and leads to Enum instances that may or may not support pickling depending on the implementation. Better to say "if you use the functional API without passing the module name, your Enum instances won't support pickling".
Strong -1 from me on this. The PEP has been accepted. Feel free to raise this for discussion if you want to change the decision. With all due respect to IronPython, most users can write simpler code and have pickling work just fine.
- Mike Bayer (SQL Alchemy author) has requested the ability to derive a custom metaclass that turns off the Enums-with-members-are-final subclassing restriction (relaxing the rule that members of an enum are instances of that enum to "members of an enum are instances of that enum, or one of its parent enums", but otherwise keeping the same behaviour as the standard enums).
Nick, after you read the whole code, please consider whether it's not already complex enough even without catering to every other need out there.
The accepted PEP states that the frame hack should be removed: "To support pickling of these enums, the module name can be specified using the module keyword-only argument."
Ergo, if you don't specify it, they cannot be pickled. Explicit is better than implicit, and this hack should not propagate beyond namedtuple (it should actually be deprecated in namedtuple as well).
As far as the second point goes, I had already reviewed the PEP implementation before making the comment (that's why I am reasonably sure you can already do it just by overriding _get_mixins). I see it as similar to the changes that were already made to support autonumbered subtypes.
However, also note that I said we should wait before doing anything about providing a supported mechanism for that customisation. I've now created bpo-17954 to cover a possible refactoring and documentation of that part of the implementation.
Sorry everyone, the frame hack needs to stay. This is not negotiable.
Why we need two ways to do it? If "module=name" is available, what's the rationale for providing the option of leaving it out when pickling support is required? (The only times the frame hack will work to enable pickling, the explicit fix will also work).
Ethan, something is wrong with _StealthProperty. Well, two things. First, it's way too general and can be cut to just what Enum needs. Second, this is enough to make the tests pass:
class _StealthProperty():
"""
Returns the value in the instance, or the virtual attribute on the class.
A virtual attribute is one that is looked up by __getattr__, as opposed to
one that lives in __class__.__dict__
"""
def __init__(self, fget):
self.fget = fget
def __get__(self, obj, objtype=None):
return self.fget(obj)
def __set__(self, obj, value):
raise AttributeError("can't set attribute")
def __delete__(self, obj):
raise AttributeError("can't delete attribute")
Now this is fishy because __get__ gets obj=None when called on a class and therefore self.fget(obj) raises an exception. But this gets caught somewhere in your code or tests and ignored. However, the right thing is still returned. I didn't have more time to investigate, but this must be cleaned out.
Also, your test must run within the regrtest framework. Currently I get:
./python -mtest.regrtest test_enum
[1/1] test_enum
test test_enum failed -- Traceback (most recent call last):
File "/home/eliben/python-src/default/Lib/test/test_enum.py", line 245, in test_pickle_enum_function_with_module
self.assertIs(Question.who, loads(dumps(Question.who)))
_pickle.PicklingError: Can't pickle <enum 'Question'>: attribute lookup __main__.Question failed
1 test failed: test_enum
Eli,
The original _StealthProperty checked to see if it was being called on instance or class, and if it was the class it invoked __getattr to attempt a lookup for an enum member. Your version does not check, but, ironically, the exception raised is AttributeError, and so Python is calling __getattr anyway and so finds the virtual enum member.
While this is cool, and it works, I think it's much less mysterious, magical, and downright confusing to keep the original behavior and call __getattr__ from _StealthProperty. On the other hand, it might make somebody think, and that's always good, so I'm happy to leave it your way.
regrtest framework now supported (had to change the module name I was passing in from '__main' to __name).
Guido has promised an explanation for why he wants to keep the frame hack once he is back at work next week. To help him target that reply appropriately for my concrete objections to retaining it, I'd like to explain a bit more about why I think it's fundamentally impossible to create a robust stack inspection mechanism for implicit pickling support. Even if a dedicated mechanism for it is added, the information the interpreter needs to make a sensible reliable decision simply isn't there.
Consider the following:
# In utils.py
def enum_helper(name, prefix, fields):
if isinstance(fields, str): fields = fields.split()
e = Enum(name, (prefix + field for field in fields))
return e
# In some other module
from utils import enum_helper
MyEnum = enum_helper(MyEnum, "st_", "mtime atime ctime")
This breaks the frame hack, but would work correctly if enum_helper was redesigned to accept an explicit module name, or if we adopted something like the "name binding protocol" discussed on Python ideas.
If we adopted a simplistic rule of ignoring function scopes and only look at module and class scopes, then we break the semantics of the global keyword.
I consider the frame hack fundamentally broken, since there's currently no way the interpreter can distinguish between an incidental assignment (like the "e = Enum..." call in util.py) and a destination assignment (like the "MyEnum = enum_helper..." call), and if we *do* add a different syntax for "this supports pickling" assignments (like the def name = expr syntax on Python ideas), then a name binding protocol makes more sense than implicit contextual magic.
I also created bpo-17959 to track a legitimate concern with the current aliasing design, without impacting incorporation of the accepted PEP.
Another post-incorporation proposal (bpo-17961) relating to the values used for the functional API.
After more thought, I think we should leave the shortened version of _StealthProperty as is.
The reasoning being that if _StealthProperty does the __getattr lookup itself, and fails, it will return an AttributeError... and then Python will use __getattr again to try and find the attribute.
So I'll add a comment into the shortened version and leave it at that.
Nick, could you open a separate issue for the frame hack discussion, like you did for the other things? You can make this one depend on it, if you feel it's a "blocker", but let's please not intermix more discussion here.
Ethan, I pasted the minimized version to point out the problem; I fully agree it should do the getattr lookup because otherwise it's very difficult to understand what's going on. Let's keep exceptions for actual exceptional situations and explicitly code the normal path.
Also, _MemberOrProperty is probably a more descriptive name.
Eli: done, created as bpo-17963
Here's the latest patch.
Note that the functional API portion is broken, and I'll get back to that this evening. Please only comment on the working code. ;)
I'll make that _MemborOrProperty name change then as well.
Here's the promised explanation why I want to keep the getframe hack. I'm sure it won't satisfy everyone, but this will have to do.
There are two parts to my argument. TL;DR: (a) by implementing this hack, we will maximize user happiness; (b) I expect that all Python implementations can provide the functionality needed.
So why does this maximize user happiness? First of all, enums are such simple objects that it's a disgrace to have an enum that isn't picklable. But most users are lazy and lack imagination, so if they find they have to do extra work to ensure that something is picklable, they will make a judgement call -- is the extra work to make it picklable worth it? Unfortunately they will try to make this judgment call in a fraction of a second, and they limited imagination they may well say "I can't imagine ever pickling this", save themselves the work, and move on. If you think more than a second about the decision, you've wasted more time than it takes to type "module=name", so it's going to be a split-second decision. But nevertheless, having to think about it and typing it is a distraction, and when you're "in the zone" you may prefer to spend your time thinking about the perfect names for your enum class and values rather than the red tape of making in picklable.
So, in my mind, it's a given that there will be many enums that are missing the module=name keyword argument. Moreover, it's likely that most of the time this will be fine -- you can get through most days just fine without ever reading or writing a pickle. (For example, I very much doubt that I've ever pickled one of the flag values used by the socket module to define e.g. the address family, socket type, or protocol.)
But now you enter a different phase of your project, or one of your collaborators does, or perhaps you've released your code on PyPI and one of your users does. So someone tries to pickle some class instance that happens to contain an unpicklable enum. That's not a great experience. Pickling and unpickling errors are often remarkably hard to debug. (Especially the latter, so I have privately admonished Ethan to ensure that if the getframe hack doesn't work, the pickle failure should happen at pickling time, not at unpickle time.) Once you've tracked down the source, you have to figure out the fix -- hopefully just typing the error message into Google will link back to a StackOverflow answer explaining the need to say "module=name". But the damage is done, especially if the person encountering the pickling error is not the original author of the code defining the enum. (Often the person downloading and using a package from PyPI has less advanced Python knowledge than the package author, so they may have a hard time debugging the situation.)
You can see how having the getframe hack in place makes life more pleasant for many people -- the package user won't have to debug the pickling problem, and the package author won't have to deal with the bug report and fix.
But what about other Python implementations? Well, TBH, they have plenty of other issues. The reality is that you can't take a large package that hasn't been tested on Jython, or IronPython, or PyPy, and expect it to just work on any of those. Sure, things are getting better. But there are still tons of differences between the various Python implementations (as there are between different versions of CPython), and whether you like it or not, CPython is still the Python version of choice for most people. The long and short of it is that porting any significant package to another implementation is a bit of work, and keeping the port working probably requires adding some set of additional line items to the style guide used by its developers -- don't use feature X, don't depend on behavior Y, always use pattern Z...
However, I don't expect that "always pass module=name when using the enum convenience API" won't have to be added to that list. sys._getframe() is way more powerful than what's needed. Even on platforms where sys._getframe() is unimplementable (or disabled by default in favor of a 10% speedup), it should still be possible to provide an API that *just* gets the module name of the caller, at least in case the call site is top-level module code (and for anything else, the getframe hack doesn't work anyway). After all, we're talking about a full Python implementation, right? That's a dynamic language with lots of introspection APIs, and any implementation worth its salt will have to deal with that, even if full-fledged sys._getframe() is explicitly excluded.
So I propose that we use sys._getframe() for the time being, and the authors of Jython, IronPython and PyPy can get together and figure out how to implement something like sys.get_calling_module_name(). They have plenty of time -- at least until they pledge support for Python 3.4. To make it easy for them we should probably add that API to CPython 3.4. And it's fine with me if the function only works if the caller is top-level code in a module.
Which reminds me. Nick offered another use case where using sys._getframe() breaks down: a wrapper function that constructs an enum for its caller. First of all, I think this is a pretty rare use case. But additionally, that wrapper could just use sys.get_calling_module_name(), and everything would be fine.
PS. Whoever adds sys.get_calling_module_name() to CPython, please pick a shorter name. :-)
From PyPy's perspective we don't really care what you name this particular bikeshed, and it's probably not that important to us (in this particular case).
As far as I know IronPython is the only Python VM that doesn't have _getframe() support by default (you need a CLI flag at startup, IIRC). In PyPy calling _getframe (right now) has a negative performance impact, but it's local to wherever you're calling it, so unless you create Enum in frequent loops (as opposed to just once at module level), this doesn't really matter to us.
If someone wants to invent a more narrow, tailored API for stack introspection, logging is a much better target (right now each call to logging.{info,error,etc.} calls _getframe(), which does have a real performance impact. (We'll eventually invent enough engineering to fix this ourselves, but in the meantime if somewhere were to pain this bikeshed, we certainly wouldn't object ;)).
I've come across something in the implementation here that I'd like some clarification on. What is the purpose of overriding __dir__ in Enum and EnumMeta? It doesn't change any behavior that I'm aware of, just makes things look a little nicer when someone calls dir() on their Enum. And, in fact, it can make things a little confusing. For example:
>>> class Test(enum.Enum):
... foo = 1
... bar = 2
... baz = 3
...
>>> dir(Test)
['__class__', '__doc__', '__members__', 'bar', 'baz', 'foo']
>>> Test.mro
<built-in method mro of EnumMeta object at 0x01D94D20>
This brings up another interesting case:
>>> class Test2(enum.Enum):
... mro = 1
... _create = 2
...
>>> dir(Test2)
['__class__', '__doc__', '__members__', '_create', 'mro']
>>> Test2.__members__
mappingproxy(OrderedDict([('mro', <Test2.mro: 1>), ('_create', <Test2._create: 2>)]))
>>> Test2['mro']
<Test2.mro: 1>
>>> Test2.mro
<built-in method mro of EnumMeta object at 0x01D90210>
>>> Test2._create
<bound method type._create of <class 'enum.EnumMeta'>>
>>>
From using "mro" or "_create", I would have expected either ValueError or for them to work properly. I don't know whether this should be fixed (one way or the other), documented, or just left alone; those kind of names really shouldn't ever be used anyway. It's something I stumbled across, though, and I just wanted to make sure that those who do have opinions that matter are aware of it :)
Thanks Guido, now that I fully understand your reasoning, I can accept that this is a valid "practicality beats purity" situation.
Got the pickle issues worked out. Added super to the metaclass' __new__. Checking for illegal names of members and raising ValueError if any are found (I know, I know, safety checks! But such an enum is broken from the getgo so I see no reason to allow those names through).
Thanks everyone for the excellent feed back. I really appreciate it. Hopefully we're almost done! :)
Small nitpick, weakref is imported but not used in the latest patch.
Ethan, just a reminder to write that documentation...
It's basically a stripped down version of PEP-435 (leave all the philosophy and history out), with a few concrete "reference" sections explaining the API precisely.
I tweaked the code a bit (no functionality changes, mostly cleanups and a bit of refactoring). Figured it will be easier to just send an updated patch than another review. The diff from patch 06 can be seen via the Rietveld interface.
Also, after reading the code more carefully, I think we're doing a mistake by over-complicating it for the sake of custom enum metaclasses and over-customization (like auto numbering). The original point Guido raised against auto-numbering was too much magic in the implementation. Well, we already have that in Lib/enum.py - the code is so complex it seems fragile because of the tight coupling with many class and metaclass related protocols. Just defining a wholly new enum implementation that does something very specific seems simpler than customizing the existing one.
I'd suggest we stick to the existing Enum + IntEnum, giving up the more complex customizations for now. It can always be added in the future if we see it's very important.
Wow. I definitely felt like an apprentice after reading the changes. Thanks, Eli, that looks worlds better!
Thanks Ethan :)
From my point of view this is LGTM, as long as:
We can always add more capabilities to Enum. We can "never" take them away once added, and this complicated code will remain with us forever even if no one ends up using it.
Supporting extensions was one of the things that got Ethan's version through review. So -1 on going back on our promise to support those variants. They have been reviewed and tested just as thoroughly as the rest of the design.
Also, we know for a fact that people plan to use the customisation features
I'm not sure which promises you're referring to Nick, and to whom they were made; the only formal promise we made is PEP-435 - and it doesn't mention this extensibility.
I won't argue beyond this comment, since I know I'm part of the minority opinion here. However, I still think this is a mistake.
The most important original goal of Enum (as discussed during the language summit) was to replace all the custom enum implementations by one that is standard. A far fledged extension mechanism will just make it so we'll have a fleet of bastardized "extended enums", each with its own capabilities, each different from the others. With one standard Enum, when you're reading someone's code and you see:
class Foo(Enum):
...
You know very well what Foo is. Restricted extensions like IntEnum and even your @enum.unique are still tolerable because they're explicit:
# enum.unique is standard and says what it is explicitly
@enum.unique
class Foo(Enum):
...
But if we open the gates on customization, we'll have:
class Foo(AutoEnum):
Red, White, Black
And:
class Bar(SomeOtherAutoEnum):
Red = ...
White = ...
Black = ...
And:
class Baz(SomeEvenOtherMagicEnum):
... # whatever goes here
And we're back to square 1, because these Enums are not standard, and each framework will have its own clever customization one will need to understand in order to read code with Enums.
Exposing and documenting the metaclass and customizations of __new__ is a whole coffin for the "there is only one way to do it" decision of stdlib's Enum. It might have been better to just define AutoNumberedEnum, BitfieldEnum and Magic42Enum as part of the enum package in stdlib and be over with it; but this was strongly rejected by others and particularly Guido during the summit and later. Now we're just creating a back-door to get into the same situation.
Eli, what's wrong with having a backdoor? Python is literally *full* of backdoors. I have a feeling that somehow you are trying to build an Enum class that is unpythonic in its desire to enforce some kind of "ideal enum" behavior.
Guido, IMHO back-doors are fine in many cases, just not this one. The way I see it, our main goal here is to collect a bunch of custom implementations of enums under a single umbrella. This is not very different from what was done with OrderedDict and namedtuple at some point. There were probably a bunch of custom implementations, along with more and less commonly used recipes. At some point a single implementation was added to the stdlib, without (AFAICS) major back-doors.
Yes, the Enum case is vastly more complex than either OrderedDict or namedtuple, and there is a multitude of different behaviors that can be anticipated (as the lengthy discussions leading to the acceptance of PEP-435 demonstrated). And yet, I was also hoping to have a single canonical implementation, so that people eventually accept it as "the one". Stdlib modules tend to win over in the long run.
The other point is that I think the implementation could be much simpler without having these back doors. As it stands now, the code is complex and hence brittle. Any change will be difficult to do because we're locked down very strictly by a set of intrusive and deep, yet externally "promised" interfaces. The same can be said, again, about OrderedDict and namedtuple, the code of which is very straightforward.
Maybe I'm blowing this out of proportions, maybe not. I'm not sure. As I said, I don't want to strongly argue about this. If both you and Nick are OK with keeping the customization mechanisms in, I defer to your judgment.
Eli, remember that TOOWTDI stands for "There's one *obvious way to do it" rather than "There's *only one way to do it". The latter interpretation leads to insanely complex APIs that attempt to solve everyone's problems, while the former favours 80% solutions that cover most use cases, with extension hooks that let people handle the other 20% as they see fit.
The point of the enum standardisation is to have a conventional way that enums *behave*, and then allow variations on that theme for those cases where the stdlib implementation is "close, but not quite what I need or want".
The whole metaclass machinery is built around this concept of letting people create domain specific behaviour, that is still somewhat unified due to conventions like the descriptor protocol. You can do a *lot* with just descriptors, so if you don't need a custom metaclass, you shouldn't use one.
PEP-422's class initialisation hook is aimed specifically at certain cases that currently need a metaclass and providing a simpler way to do them that lets you just use "type" as the metaclass instead.
It's the same with enums - if you don't need to customise the metaclass, you shouldn't. But there are some use cases (such as syncing the Python level enum definition with a database level one) where additional customisation will be needed. We also want to give people the freedom they need to experiment with different forms of definition time syntactic sugar to see if they can come up with one we like enough to add to the standard library in 3.5.
Does documenting these definition time extension points constrain what we're allowed to do in the future? Yes, it does. But, at the same time, it takes a lot of pressure off us to add more features to the standard enum type over time - if people have niche use cases that aren't handled well by the standard solution (and we already know they do), we can point them at the supported extension interface and say "go for it". For the majority of users though, the standard enum type will work fine, just as ordinary classes are adequate for the vast majority of object oriented code.
Somewhat related, I *know* you've read type.__new. Compared to that, enum.EnumMeta.__new is still pretty straightforward ;)
Working on documentation...
Hopefully the final bit of code, plus docs.
Code changes:
_names_ are reserved
Doc changes (different from the PEP):
examples of AutoEnum, UniqueEnum, and OrderedEnum
Apologies for the noise -- was having trouble getting the correct patch attached. :/
So, which is better?
To have a @unique class decorator as part of the module, or to have a UniqueEnum recipe in the docs?
A decorator is immediately usable, but requires remembering an extra line of code.
An example requires being put into a local utility module, but also serves as a guide to customising Enums. (Took be about five tries to get it right. :/ )
Nick prudently moved the unique discussion to its own issue - 18042. Let's get the initial implementation & docs committed first (without unique in the implementation, although it's fine to have it as an example in the docs for now), close this issue, and then discuss in 18042 whether unique should be part of the stdlib-provided API or not.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/ethanfurman' closed_at =
created_at =
labels = ['tests', 'type-feature', 'library', 'docs']
title = 'Code, test, and doc review for PEP-0435 Enum'
updated_at =
user = 'https://github.com/ethanfurman'
```
bugs.python.org fields:
```python
activity =
actor = 'ncoghlan'
assignee = 'ethan.furman'
closed = True
closed_date =
closer = 'python-dev'
components = ['Documentation', 'Library (Lib)', 'Tests']
creation =
creator = 'ethan.furman'
dependencies = []
files = ['30192', '30195', '30202', '30209', '30231', '30265', '30384', '30506', '30511', '30512', '30548']
hgrepos = []
issue_num = 17947
keywords = ['patch']
message_count = 65.0
messages = ['188812', '188814', '188815', '188833', '188837', '188839', '188841', '188856', '188881', '188891', '188893', '188894', '188895', '188921', '188933', '188941', '188947', '188948', '188975', '188977', '188979', '188982', '189014', '189020', '189021', '189027', '189029', '189036', '189160', '189164', '189175', '189188', '189263', '189446', '190033', '190095', '190099', '190115', '190118', '190119', '190122', '190123', '190126', '190130', '190131', '190312', '190795', '190796', '190803', '190804', '190805', '190806', '190842', '190844', '190847', '190862', '190870', '190980', '191103', '191104', '191133', '191137', '191172', '191173', '191181']
nosy_count = 15.0
nosy_names = ['gvanrossum', 'barry', 'ncoghlan', 'vstinner', 'benjamin.peterson', 'ezio.melotti', 'alex', 'eli.bendersky', 'docs@python', 'ethan.furman', 'python-dev', 'zach.ware', 'pconnell', 'dstufft', 'isoschiz']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue17947'
versions = ['Python 3.4']
```