zopefoundation / zope.publisher

Map requests from HTTP/WebDAV clients, web browsers, XML-RPC and FTP clients onto Python objects
Other
3 stars 12 forks source link

cgi is going to be deprecated in Python 3.8 and removed in 3.10 #39

Closed mgedmin closed 3 years ago

mgedmin commented 5 years ago

zope.publisher relies on stdlib's cgi.FieldStorage() for HTTP request parsing. PEP-594 indicates that cgi will be removed in Python 3.10. What do we want to do?

jugmac00 commented 5 years ago

As far as I understood the announcement, removing it from Python core does not mean it gets killed. @tiran hopes somebody takes up the maintenance of those packages.

Without digging deeper

Other than that - @d-maurer is about digging real deep into the Object Publisher and maybe can contribute to this discussion.

P.S. @mgedmin you created a ticket here, but FieldStorage is also directly used in Zope.ZPublisher.HTTPRequest

icemac commented 5 years ago

cgi.FieldStorage is also used here: https://github.com/zopefoundation/zope.publisher/blob/b21e67576837a843c51ccb93e2941bf3b43dbcef/src/zope/publisher/browser.py#L613-L620

@jugmac00 zope.publisher is the Zope 3 reimplementation of ZPublisher.

cjwatson commented 4 years ago

cgi.FieldStorage also has some pretty terrible bugs that don't seem to be getting fixed in any kind of useful timeframe (e.g. https://bugs.python.org/issue27777). Maybe vendoring it wouldn't be the worst thing in the world, and then at least it would be possible to make this work properly in zope.publisher?

cjwatson commented 4 years ago

I've been working on this, and have two alternatives reasonably close to being ready:

Having got both approaches more or less working, my current preference is to switch to multipart, although it'd need a new release with https://github.com/defnull/multipart/pull/21 first to avoid ResourceWarnings leaking through on Python 3 (the last release was relatively recent, so I'm hoping this won't take too long). Before this happens and I put the extra few hours of effort into polishing things up for a PR, do any maintainers have opinions on this? multipart is pretty lightweight, but I assume that adding new dependencies to Zope would require at least some degree of consensus.

jugmac00 commented 4 years ago

The mentioned PEP for removing cgi.FieldStorage and other so called "dead batteries" seems to be on halt - there was no discussion for 11 months. Also, I just checked the cpython source and I found no deprecation warning for cgi.

While I appreciate your research a lot, I'd suggest to postpone the decision what to do. Python 3.10' release is due for October 2021. So, there is some time left.

Also, meanwhile and also after reading https://discuss.python.org/t/pep-594-removing-dead-batteries-from-the-standard-library/1704 I think there is a third option available besides vendoring and migrating to multipart.

In the discussion some more users of FieldStorage showed up, including the maintainer of WebOb - so iff cgi will be removed, instead of vendoring, we could create a real package and also publish it on PyPi, so the other users of cgi could join.

cjwatson commented 4 years ago

The reason I'm interested in this isn't solely because of the proposed removal from the stdlib; I'm running into https://bugs.python.org/issue27777 when trying to port a project (lazr.restful) that depends on zope.publisher to Python 3, and this is a major blocker for me at work that I can't really postpone dealing with much further.

It's true that a real package of FieldStorage would be another option, and having a consistent set of bugs across Python versions would be helpful. But the reason I'd like us to move away from it isn't so much the "dead batteries" thing, but rather that it's buggy and crams too much into a single interface in such a way that zope.publisher already has to work around it in difficult-to-follow ways (see the awkward query string handling in processInputs). Parsing query strings, multipart/form-data payloads, and other types of payloads separately would be easier to follow and would avoid at least the bug above.

A fourth option which I see from that discussion link (thanks!) is to use the standard email package to parse multipart/form-data payloads. I hadn't thought of that and it seems likely to result in something of roughly equivalent complexity to using multipart; I'll give it a try as soon as I get a chance. However, by default it parses the entire message into memory, which would likely be unsuitable for some cases involving large file uploads, so I think whether this is viable will depend on whether it's possible to override that behaviour.

jugmac00 commented 4 years ago

It is super unfortunate that https://bugs.python.org/issue27777 does not get much attention. Even worse, there seem to be open pull requests to fix this issue, but no reviewer available. Especially https://github.com/python/cpython/pull/21457 looks promising.

Maybe there is even a 5th option? :-)

webob seems to workaround some bugs with a compat.py module: https://github.com/Pylons/webob/blob/5c062aef9397b27915c5cc2ed2f202bff7494eca/src/webob/compat.py

I think @bertjwregeer would not mind us if we vendored his compat module.

Anyway, this will be a tough decision.

cjwatson commented 4 years ago

webob's compat.py is certainly an option, and respect to them for managing that, but I'd quite strongly prefer to avoid that if possible - it makes it even harder to figure out what's going on, which is already a challenge at times!

jugmac00 commented 4 years ago

@cjwatson If the following is an option for you, maybe you could also speak up at cpython's issue tracker, that above mentioned issue is a blocker to migrate to Python 3, and that launchpad is involved, and also maybe you could run the cpython pr against launchpad's testsuite... maybe this could accelerate the process on cpython's tracker. Who knows.

cjwatson commented 4 years ago

Well, I might, but:

... so I'd much rather put time and energy into a viable replacement.

digitalresistor commented 4 years ago

Feel free to steal WebOb's compat stuff for cgi FieldStorage. I've tried to get upstream stuff fixed, but I just don't have the time to go through the motions, instead I carry the fixes myself.

That being said, I'd love to see a replacement for FieldStorage, and one that has an appropriate test suite, is licensed under something permissive, and is maintained. I've also looked at writing my own replacement in the mean time, but time has been a factor.

I am keen to follow this discussion :-).

cjwatson commented 4 years ago

A fourth option which I see from that discussion link (thanks!) is to use the standard email package to parse multipart/form-data payloads. I hadn't thought of that and it seems likely to result in something of roughly equivalent complexity to using multipart; I'll give it a try as soon as I get a chance. However, by default it parses the entire message into memory, which would likely be unsuitable for some cases involving large file uploads, so I think whether this is viable will depend on whether it's possible to override that behaviour.

I looked at this briefly yesterday evening. I think that was actually more a suggestion of where such a parser might live in the future, rather than something that exists today. While there is a multipart parser inside email.feedparser, it doesn't generalise well to the HTTP case (in particular, I don't see a way to tell it about an outer boundary provided out-of-band by a WSGI CONTENT_TYPE environment variable).

I feel slightly stalled here due to too many options, but I think I'll just polish up both the approaches that I have so far and push them up as PRs. That way at least people can compare actual code rather than theory.

jugmac00 commented 4 years ago

From the PEP discussion page:

Is it still realistic that the dead batteries will be deprecated with 3.8 and the removal will take place in 3.10?

No because Python 3.8 is already out and Python 3.9 is in beta. The PEP will need to be updated to target Python 3.10 with targeted removals in 3.12 (assuming someone drives the PEP to being finished).

icemac commented 1 year ago

@ohlogic That's what @cjwatson also did in this repository in #55.