Open perey opened 2 months ago
Thanks for the bug report and writeup. Care to make a PR? It'd also be good to add a regression test (turn a small version of your example code in this issue into a test case and assert that the dictionary doesn't contain the bogus entries perhaps).
Bug report
Bug description:
Issue
The
add_handler()
method ofurllib.request.OpenerDirector
matches too broadly, and generates spurious handler entries as a result.Specifically, any method name that starts with zero or one underscore, followed by
error
,open
,request
, orresponse
, and that isn't in the small list of exclusions, will be treated as a handler method.Minimal reproducible example
Cause
https://github.com/python/cpython/blob/401fff7423ca3c8bf1d02e594edfd1412616a559/Lib/urllib/request.py#L419-L421
In the normal case, this code splits a method name like
http_open
on the first underscore character, givingprotocol == 'http'
andcondition == 'open'
.If there is a leading underscore character, this produces a false match with an empty protocol. For example, a method called
_open
will giveprotocol == ''
andcondition == 'open'
.If there is no underscore character, then
meth.find('_') == -1
, and weird results happen. In this case,condition
contains the entire method name, andprotocol
contains all but the last character. For example, a method calledopen
will giveprotocol == 'ope'
andcondition == 'open'
.The subsequent lines branch on the content of
condition
, with onlyerror*
,open
,request
, andresponse
treated as handler methods.Impact
Low. It is probably impossible that a request would have the empty string as its protocol, and unlikely that it would be
ope
,reques
,respons
, orerro
. If one of these did happen, the spurious match wouldn't be called:OpenerDirector
would go looking for (say)ope_open
, probably not find it, and raise anAttributeError
. (It's for this reason that I'm positive this is a bug, and not an undocumented feature whereopen
, etc. are meant to be registered as handlers.)Spurious entries are added to the internal members
handle_open
,handle_error
,process_response
,process_request
, and (especially) the catch-allhandler
, usingbisect
to perform sort-preserving insertions. This is unnecessary busywork, but I have no data on any actual performance impact.Solution
It suffices to add a check like so, immediately after
meth.find("_")
:This catches both a leading underscore (
i == 0
) and no underscore (i == -1
). I've forked the repo and made such a change myself, but submitting a pull request is a big step up from filing an issue, and it expects an issue to be filed first in any case, so… here it is! :smile:Related issues
I can only see one relevant existing issue, #61924. This was a withdrawn proposal to clean up the
add_handler()
code. It was more far-reaching than the solution above, but (on my inspection at least) it would've fixed this bug. At the time, @bitdancer commented, "I'll try to keep this patch in mind if we have occasion to fix anything in that code".The code to extract a specific type of error behaves similarly:
https://github.com/python/cpython/blob/401fff7423ca3c8bf1d02e594edfd1412616a559/Lib/urllib/request.py#L424-L425
However, I'm not positive that it's an error for it to give (for example)
kind == 'error'
when the method name is*_error
(or_error
orerror
), so I haven't explored further.CPython versions tested on:
3.12
Operating systems tested on:
No response