Requests 2.5.1 doesn't recognize unicode filenames for uploads

sjagoe commented 9 years ago

After merge of https://github.com/kennethreitz/requests/pull/2379, to allow filenames to be int types, unicode filenames are no longer recognized under Python 2.

This checks that the filename is a builtin str, which has different behaviour on Python 2 and Python 3: requests/utils.py:118: if name and isinstance(name, builtin_str) and name[0] != '<' and name[-1] != '>':

In requests/compat.py, builtin_str is defines as str, which is non-unicode bytes in Python 2 and unicode in Python 3. Perhaps the check should be against basestring, or is this change in behaviour intended?

sjagoe commented 9 years ago

In fact, I have not tried this yet, but I think it will make writing code compatible with Python 2 and Python 3 much more difficult. In working around the Python 2 issue by encoding the unicode filename, it is likely that a Python 3 issue will be introduced. This is going to require version specific encoding handling to ensure a unicode filename under Python 3 and a non-unicode filename under Python 2.

Lukasa commented 9 years ago

@sjagoe I have a question: if you provide us a unicode filename, what encoding should we render that with when transmitting that over the wire?

sjagoe commented 9 years ago

@Lukasa Admittedly, that is an issue. There is no way for you to know. I guess in the past is just encoded to ascii?

In which case, this issue is still two things:

v2.5.1 broke compatibility with v2.5.0.
I have just verified that under Python 3, if the filename is encoded beforehand, then the original issue I reported under Python 2 rears its head. That is, requests does not send the filename and my server sees the filename simply as 'file', instead of the original.

EDIT: To clarify, under Python 3, a Unicode filename is required (any encoded filename, e.g. ascii, utf-8, etc, is not sent to the server) and under Python 2, and encoded filename is required.

Lukasa commented 9 years ago

Agreed.

What we need, I think, is for guess_filename to check on basestring and then call to_native_str.

sjagoe commented 9 years ago

I would even be okay with it if a unicode filename was rejected outright with an exception, requiring the user to explicitly encode, but I don't know what the policy of requests is wrt user-annoying strictness ;-)

sjagoe commented 9 years ago

Of course, in the interest of compatibility, using to_native_str for any 2.5.x fixes would be the best way forward in any case.

sigmavirus24 commented 9 years ago

So the reality is that we need to unbreak this until 3.0 at which point we can redefine the requests interface for this. I have an idea for a solution and am working on a pull request.

abbeycode commented 9 years ago

I'm seeing this issue now in 2.6.0 with the following code:

url = 'https://slack.com/api/files.upload'
with open('File β.txt', 'rb') as file:
    r = requests.post(url, files={'file': file}, params={
        'token': api_token,
        'channels': channel
    })

I get a response that no data was found. I'm using Python 3.

Lukasa commented 9 years ago

@abbeycode Can you run the following snippet for me please?

from requests.utils import guess_filename

with open('File β.txt', 'rb') as file:
    print guess_filename(file)
    print type(guess_filename(file))

sigmavirus24 commented 9 years ago

Luckily Slack is very responsive about these sorts of things and you should kindly inform them that they're not following 10+ year old specifications and they should (namely RFC 2231). I suspect this is more of a duplicate of #2117 than this issue.

Lukasa commented 9 years ago

Yeah, suspect you're right @sigmavirus24. Note that this is a fairly widely-deployed standard, and the HTTPBis is working on bringing it up to date in the draft for RFC 5987bis, so servers and frameworks that don't support it really should.

sigmavirus24 commented 9 years ago

@Lukasa it seems that 5987bis is expired =( (Expires Jan 3, 2015)

Lukasa commented 9 years ago

Work is still ongoing, WG is aiming for last call by IETF Prague.

abbeycode commented 9 years ago

@Lukasa this is what I get:

File β.txt
<class 'str'>

Lukasa commented 9 years ago

Yup, so this means your server doesn't support the RFC that's used to automatically encode binary data. I recommend you encode the name of your file before opening it, or provide the filename yourself:

url = 'https://slack.com/api/files.upload'
with open('File β.txt', 'rb') as file:
    r = requests.post(url, files={'file': ['File β.txt'.encode('utf-8'), file]}, params={
        'token': api_token,
        'channels': channel
    })

abbeycode commented 9 years ago

@Lukasa thanks for all the help, but I'm getting this:

TypeError: Type str doesn't support the buffer API

on this line:

if not any(ch in value for ch in '"\\\r\n'):

in urllib3/fields.py (line 34). This seems like a Python 3 issue, perhaps?

sigmavirus24 commented 9 years ago

Hm. Let me look at that

sigmavirus24 commented 9 years ago

Oh wait, you mean from using @Lukasa's example using encode? Yeah. I think you're better bet is to annoy Slack until it works. I'll have to look into your existing problem first though.

abbeycode commented 9 years ago

I sent Slack an email describing what's happening, so hopefully they'll respond and fix the issues. Thanks @Lukasa and @sigmavirus24 for all the help!

psf / requests

Requests 2.5.1 doesn't recognize unicode filenames for uploads #2411