Closed zhaoguixu closed 8 years ago
Hmm, this one is really tricky.
I think the best fix here is actually in urllib3: we should catch UnicodeDecodeError
. However, if we catch it, we should return result
. The rationale is that the user has already provided us with a bytestring, so they presumably think they know what they're doing. @shazow, thoughts?
Does this work if the filename is properly encoded in unicode to begin with? If there is no pre-formatted input that works, then yes that should be fixed in urllib3.
Yes, it works when request.files = {'f': (u'føø', u'\xff')}
Hmmm, so my general rule for urllib3, as a library-level module, is garbage-in-garbage-out. I try to avoid converting between bytes/unicode whenever it's direct input, though sometimes it's necessary if it's indirect. I can be convinced otherwise though.
I try to avoid converting between bytes/unicode whenever it's direct input
The problem is that's not true of this function. This function explicitly does exactly that. My proposal is that when it can't, it should just let things keep going.
Oh woops didn't even realize format_header_param was in urllib3, thought it was a requests thing. In that case, yesh +1.
Will be fixed when we next pull in a new version of urllib3.
A simple test case:
The terrible unicode feather of python2.x will implicitly cast the byte-string to unicode.
results
contains non-ascii characters which will be casted to unicode first before callingencode
method. However the catch informat_header_param
UnicodeEnocdeError is not enough. The api specification in this function says value is a unicode string. Actually its a so low-level function. Requests seems ignore this and allow user-decided non-ascii builtin-string flows in, resulting this traceback.Thanks,