Closed sjagoe closed 9 years ago
In fact, I have not tried this yet, but I think it will make writing code compatible with Python 2 and Python 3 much more difficult. In working around the Python 2 issue by encoding the unicode filename, it is likely that a Python 3 issue will be introduced. This is going to require version specific encoding handling to ensure a unicode filename under Python 3 and a non-unicode filename under Python 2.
@sjagoe I have a question: if you provide us a unicode filename, what encoding should we render that with when transmitting that over the wire?
@Lukasa Admittedly, that is an issue. There is no way for you to know. I guess in the past is just encoded to ascii
?
In which case, this issue is still two things:
'file'
, instead of the original.EDIT: To clarify, under Python 3, a Unicode filename is required (any encoded filename, e.g. ascii
, utf-8
, etc, is not sent to the server) and under Python 2, and encoded filename is required.
Agreed.
What we need, I think, is for guess_filename
to check on basestring
and then call to_native_str
.
I would even be okay with it if a unicode filename was rejected outright with an exception, requiring the user to explicitly encode, but I don't know what the policy of requests is wrt user-annoying strictness ;-)
Of course, in the interest of compatibility, using to_native_str
for any 2.5.x fixes would be the best way forward in any case.
So the reality is that we need to unbreak this until 3.0 at which point we can redefine the requests interface for this. I have an idea for a solution and am working on a pull request.
I'm seeing this issue now in 2.6.0 with the following code:
url = 'https://slack.com/api/files.upload'
with open('File β.txt', 'rb') as file:
r = requests.post(url, files={'file': file}, params={
'token': api_token,
'channels': channel
})
I get a response that no data was found. I'm using Python 3.
@abbeycode Can you run the following snippet for me please?
from requests.utils import guess_filename
with open('File β.txt', 'rb') as file:
print guess_filename(file)
print type(guess_filename(file))
Luckily Slack is very responsive about these sorts of things and you should kindly inform them that they're not following 10+ year old specifications and they should (namely RFC 2231). I suspect this is more of a duplicate of #2117 than this issue.
Yeah, suspect you're right @sigmavirus24. Note that this is a fairly widely-deployed standard, and the HTTPBis is working on bringing it up to date in the draft for RFC 5987bis, so servers and frameworks that don't support it really should.
@Lukasa it seems that 5987bis is expired =( (Expires Jan 3, 2015)
Work is still ongoing, WG is aiming for last call by IETF Prague.
@Lukasa this is what I get:
File β.txt
<class 'str'>
Yup, so this means your server doesn't support the RFC that's used to automatically encode binary data. I recommend you encode the name of your file before opening it, or provide the filename yourself:
url = 'https://slack.com/api/files.upload'
with open('File β.txt', 'rb') as file:
r = requests.post(url, files={'file': ['File β.txt'.encode('utf-8'), file]}, params={
'token': api_token,
'channels': channel
})
@Lukasa thanks for all the help, but I'm getting this:
TypeError: Type str doesn't support the buffer API
on this line:
if not any(ch in value for ch in '"\\\r\n'):
in urllib3/fields.py
(line 34). This seems like a Python 3 issue, perhaps?
Hm. Let me look at that
Oh wait, you mean from using @Lukasa's example using encode? Yeah. I think you're better bet is to annoy Slack until it works. I'll have to look into your existing problem first though.
I sent Slack an email describing what's happening, so hopefully they'll respond and fix the issues. Thanks @Lukasa and @sigmavirus24 for all the help!
After merge of https://github.com/kennethreitz/requests/pull/2379, to allow filenames to be
int
types, unicode filenames are no longer recognized under Python 2.This checks that the filename is a
builtin
str
, which has different behaviour on Python 2 and Python 3:requests/utils.py:118: if name and isinstance(name, builtin_str) and name[0] != '<' and name[-1] != '>':
In
requests/compat.py
,builtin_str
is defines asstr
, which is non-unicodebytes
in Python 2 and unicode in Python 3. Perhaps the check should be against basestring, or is this change in behaviour intended?