Closed mgedmin closed 3 years ago
As far as I understood the announcement, removing it from Python core does not mean it gets killed. @tiran hopes somebody takes up the maintenance of those packages.
Without digging deeper
Other than that - @d-maurer is about digging real deep into the Object Publisher and maybe can contribute to this discussion.
P.S. @mgedmin you created a ticket here, but FieldStorage is also directly used in Zope.ZPublisher.HTTPRequest
cgi.FieldStorage
is also used here: https://github.com/zopefoundation/zope.publisher/blob/b21e67576837a843c51ccb93e2941bf3b43dbcef/src/zope/publisher/browser.py#L613-L620
@jugmac00 zope.publisher
is the Zope 3 reimplementation of ZPublisher
.
cgi.FieldStorage
also has some pretty terrible bugs that don't seem to be getting fixed in any kind of useful timeframe (e.g. https://bugs.python.org/issue27777). Maybe vendoring it wouldn't be the worst thing in the world, and then at least it would be possible to make this work properly in zope.publisher?
I've been working on this, and have two alternatives reasonably close to being ready:
cgi.FieldStorage
from Python 3.8, adjusted for compatibility with Python 2. This works reasonably well, and has fairly low regression risk since we'd be using essentially the same code (modulo Python versions). On the other hand, it's around 1000 new lines of code in zope.publisher including tests, and it still leaves us with cgi.FieldStorage
and its bugs. We'd just get to fix them ourselves now.cgi.FieldStorage
. It doesn't do the query-string parsing that cgi.FieldStorage
does, but in some ways that's an advantage since we had to work around mismatches in how cgi.FieldStorage
handled things and how we wanted to handle things; with multipart
the glue code is much more straightforward and easier to follow. There would be a somewhat higher risk of regressions due to replacing the parser. (I would attempt to mitigate this by running Launchpad's test suite over a modified zope.publisher, which should serve as a reasonable torture test.) The latest versions no longer support Python 2, but there's an older version on PyPI which does and which seems mostly sufficient for zope.publisher's needs.Having got both approaches more or less working, my current preference is to switch to multipart
, although it'd need a new release with https://github.com/defnull/multipart/pull/21 first to avoid ResourceWarning
s leaking through on Python 3 (the last release was relatively recent, so I'm hoping this won't take too long). Before this happens and I put the extra few hours of effort into polishing things up for a PR, do any maintainers have opinions on this? multipart
is pretty lightweight, but I assume that adding new dependencies to Zope would require at least some degree of consensus.
The mentioned PEP for removing cgi.FieldStorage
and other so called "dead batteries" seems to be on halt - there was no discussion for 11 months. Also, I just checked the cpython source and I found no deprecation warning for cgi
.
While I appreciate your research a lot, I'd suggest to postpone the decision what to do. Python 3.10' release is due for October 2021. So, there is some time left.
Also, meanwhile and also after reading https://discuss.python.org/t/pep-594-removing-dead-batteries-from-the-standard-library/1704 I think there is a third option available besides vendoring and migrating to multipart
.
In the discussion some more users of FieldStorage
showed up, including the maintainer of WebOb - so iff cgi
will be removed, instead of vendoring, we could create a real package and also publish it on PyPi, so the other users of cgi
could join.
The reason I'm interested in this isn't solely because of the proposed removal from the stdlib; I'm running into https://bugs.python.org/issue27777 when trying to port a project (lazr.restful) that depends on zope.publisher to Python 3, and this is a major blocker for me at work that I can't really postpone dealing with much further.
It's true that a real package of FieldStorage
would be another option, and having a consistent set of bugs across Python versions would be helpful. But the reason I'd like us to move away from it isn't so much the "dead batteries" thing, but rather that it's buggy and crams too much into a single interface in such a way that zope.publisher already has to work around it in difficult-to-follow ways (see the awkward query string handling in processInputs
). Parsing query strings, multipart/form-data
payloads, and other types of payloads separately would be easier to follow and would avoid at least the bug above.
A fourth option which I see from that discussion link (thanks!) is to use the standard email
package to parse multipart/form-data
payloads. I hadn't thought of that and it seems likely to result in something of roughly equivalent complexity to using multipart
; I'll give it a try as soon as I get a chance. However, by default it parses the entire message into memory, which would likely be unsuitable for some cases involving large file uploads, so I think whether this is viable will depend on whether it's possible to override that behaviour.
It is super unfortunate that https://bugs.python.org/issue27777 does not get much attention. Even worse, there seem to be open pull requests to fix this issue, but no reviewer available. Especially https://github.com/python/cpython/pull/21457 looks promising.
Maybe there is even a 5th option? :-)
webob
seems to workaround some bugs with a compat.py
module:
https://github.com/Pylons/webob/blob/5c062aef9397b27915c5cc2ed2f202bff7494eca/src/webob/compat.py
I think @bertjwregeer would not mind us if we vendored his compat module.
Anyway, this will be a tough decision.
webob
's compat.py
is certainly an option, and respect to them for managing that, but I'd quite strongly prefer to avoid that if possible - it makes it even harder to figure out what's going on, which is already a challenge at times!
@cjwatson If the following is an option for you, maybe you could also speak up at cpython's issue tracker, that above mentioned issue is a blocker to migrate to Python 3, and that launchpad is involved, and also maybe you could run the cpython pr against launchpad's testsuite... maybe this could accelerate the process on cpython's tracker. Who knows.
Well, I might, but:
cgi.FieldStorage
was a mess and I'd rather use just about anything else in any case... so I'd much rather put time and energy into a viable replacement.
Feel free to steal WebOb's compat stuff for cgi FieldStorage. I've tried to get upstream stuff fixed, but I just don't have the time to go through the motions, instead I carry the fixes myself.
That being said, I'd love to see a replacement for FieldStorage, and one that has an appropriate test suite, is licensed under something permissive, and is maintained. I've also looked at writing my own replacement in the mean time, but time has been a factor.
I am keen to follow this discussion :-).
A fourth option which I see from that discussion link (thanks!) is to use the standard
multipart/form-data
payloads. I hadn't thought of that and it seems likely to result in something of roughly equivalent complexity to usingmultipart
; I'll give it a try as soon as I get a chance. However, by default it parses the entire message into memory, which would likely be unsuitable for some cases involving large file uploads, so I think whether this is viable will depend on whether it's possible to override that behaviour.
I looked at this briefly yesterday evening. I think that was actually more a suggestion of where such a parser might live in the future, rather than something that exists today. While there is a multipart parser inside email.feedparser
, it doesn't generalise well to the HTTP case (in particular, I don't see a way to tell it about an outer boundary provided out-of-band by a WSGI CONTENT_TYPE
environment variable).
I feel slightly stalled here due to too many options, but I think I'll just polish up both the approaches that I have so far and push them up as PRs. That way at least people can compare actual code rather than theory.
From the PEP discussion page:
Is it still realistic that the dead batteries will be deprecated with 3.8 and the removal will take place in 3.10?
No because Python 3.8 is already out and Python 3.9 is in beta. The PEP will need to be updated to target Python 3.10 with targeted removals in 3.12 (assuming someone drives the PEP to being finished).
@ohlogic That's what @cjwatson also did in this repository in #55.
zope.publisher relies on stdlib's
cgi.FieldStorage()
for HTTP request parsing. PEP-594 indicates thatcgi
will be removed in Python 3.10. What do we want to do?cgi
(or maybe justcgi.FieldStorage
) alive longer