Cope with unicode __all__ in Python 2 packages

zopefoundation / zope.configuration

Extensible system for supporting various kinds of configurations

https://zopeconfiguration.readthedocs.io

Other

1 stars 6 forks source link

Cope with unicode all in Python 2 packages #19

Closed cjwatson closed 6 years ago

cjwatson commented 6 years ago

When porting a package from Python 2 to 3, one natural path involves adding from __future__ import unicode_literals everywhere to prepare for the bytes/unicode change. This can cause __all__ to contain unicode elements, which mostly works but breaks star imports as follows (depending on the exact Python version - see https://bugs.python.org/issue21720):

TypeError: Item in ``from list'' not a string

TypeError: Item in ``from list'' must be str, not unicode

Star imports can usually be avoided, but it's hard to avoid this behaviour of zope.configuration if you're using ZCML, so it seems worth adjusting ConfigurationContext.resolve slightly to avoid the problem. The sys.modules logic is borrowed from 2.7's importlib.import_module. (Using importlib directly here is tricky because of the care we take with tracebacks, but for the time being we can still get by with __import__.)

jamadden commented 6 years ago

It seems to me that this scenario is considered an error in Python 2. The response of the Python core devs in 2.7.13rc1 was to make a better error message:

Improve exception message when the type of fromlist is unicode. fromlist parameter of __import__() only accepts str in Python 2 and this will help to identify the problem especially when the unicode_literals future import is used.

It's not clear to me why this code should support packages that are in violation of the language spec. This issue is going to be lurking for those packages when used elsewhere.

That said, the changes to the code are fairly innocuous, so I'd only be a -0 on this personally. @tseaver may have a different opinion (I know he considers unicode_literals to be a code smell, and I've come to agree).

cjwatson commented 6 years ago

Right, I understand that this is at best a bit weird in Python 2, but it's relatively easy to work around here and IMO it's useful to do so.

unicode_literals can be smelly in library code, since it can easily introduce API changes. I wouldn't advocate using it without caution. However, in leaf application code it can be a very useful way to make progress on transitioning. In particular, the strategy I'm following in Launchpad (which is where I ran across this) is to first change the tests to use unicode_literals (watching out for the handful of cases that really need bytes instead), and make sure everything works there; once that works, it will be much easier to change the actual application code; and once we have appropriate versions of all our dependencies we can switch to py3-only. I don't think this strategy runs afoul of the usual arguments against unicode_literals. So far this is the first non-trivial problem I've run into.

tseaver commented 6 years ago

I'm not really in favor of encouraging the __unicode_literals__ hackage: it is an attractive nuisance, because it means the author / maintainer is punting on thinking about the semantics the literals in a module. This consideration gets important because there are actually three cases which matter: must be bytes, must be text, and must be native str (important for dealing with some stdlib modules -- I'm looking at you, email).

cjwatson commented 6 years ago

Or, as in our case, it means that we know that virtually all the literals in legacy code are in fact text even if they lack an explicit u prefix (because we're a database-heavy webapp, and nearly all the literals are either strings destined for a text column or parts of responses that we only want to convert to bytes at the boundary), and going through three-quarters of a million lines of code adding unicode_literals and annotating just the few exceptions to this is an awful lot less arduous than going through adding u prefixes to nearly every single string.

icemac commented 6 years ago

Is there a consensus that this PR can be merged as is?

mgedmin commented 6 years ago

There's an LGTM from me, and it looks like nobody else wanted to review, so I'd say yes?

(I've a habit of not merging pull requests for ZopeFoundation repositories, due to the committer agreement requirement. People who have signed it can merge themselves, and if people haven't signed it, then I'm not supposed to merge their code anyway.)

icemac commented 6 years ago

@cjwatson Did you sign a Zope contributor agreement, to be able to hit the big green merge button?

cjwatson commented 6 years ago

Ah, sorry, I think I missed that something resembling consensus had emerged. Yes, I've signed the contributor agreement. I've added a comment per review and will merge.

zopefoundation / zope.configuration

Cope with unicode __all__ in Python 2 packages #19

Cope with unicode all in Python 2 packages #19