Closed sbarba closed 9 years ago
Are you sure?
$ echo "file file file.\n" >> 漢字.o8d
$ ls
漢字.o8d
>>> import requests
>>> r = requests.post('http://httpbin.org/post', files={'file': open(u'漢字.o8d', 'r')})
>>> print r.content
{
"args": {},
"data": "",
"files": {},
"form": {
"file": "file file file.\n"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connect-Time": "2",
"Connection": "close",
"Content-Length": "180",
"Content-Type": "multipart/form-data; boundary=3491ae0e5b6d465aaebb7bd63c9c750c",
"Host": "httpbin.org",
"Total-Route-Time": "0",
"User-Agent": "python-requests/2.4.0 CPython/2.7.8 Darwin/14.0.0",
"Via": "1.1 vegur",
"X-Request-Id": "f05915c9-279e-4187-8425-f0b06fc64ea2"
},
"json": null,
"origin": "77.99.146.203",
"url": "http://httpbin.org/post"
}
Seems like httpbin doesn't have a problem. Can you confirm what version of requests you're using?
Oh hang on. Interestingly, httpbin sees it as a form field, not a file object. Hmm.
Oh, yes, I remember now.
POSTing files with unicode filenames is awkward, because you didn't say what text encoding you want us to use. There's a spec for this, which we implement, but relatively few others do it and many servers don't understand it.
My suggested workaround would be to set the filename yourself using whatever encoding you choose. Unfortunately, that doesn't work:
Traceback (most recent call last):
File "testy.py", line 4, in <module>
r = requests.post('http://httpbin.org/post', files={'file': (u'漢字.o8d'.encode('utf-8'), open(u'漢字.o8d', 'r'))})
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 88, in post
return request('post', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 434, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 372, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 299, in prepare
self.prepare_body(data, files)
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 434, in prepare_body
(body, content_type) = self._encode_files(files, data)
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 151, in _encode_files
rf.make_multipart(content_type=ft)
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.py", line 173, in make_multipart
(('name', self._name), ('filename', self._filename))
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.py", line 133, in _render_parts
parts.append(self._render_part(name, value))
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.py", line 113, in _render_part
return format_header_param(name, value)
File "/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.py", line 37, in format_header_param
result.encode('ascii')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 10: ordinal not in range(128)
The problem here seems to be this line. This unconditional call to encode will actually cause an implicit call to str.decode
on Python 2, which breaks for non-ascii characters. @shazow, you prepared to consider that a bug?
Django now supports this and was appreciative of the bug report. The fact that httpbin doesn't parse this correctly is a flask/werkzeug bug I think.
Just discovered that 漢字 is Japanese Kanji and means "Chinese Characters". Enjoyed that, but the bug still stands. For now I'm able to automate testing of such filenames with Selenium, but it'd be nice to do it with requests too.
Except it's a bug in the server you're trying to upload to for not supporting a 10 year old RFC
Is there any other workaround different than changing the file name or changing the server backend?
I think someone percent-encoded the file name because whatever server they were communicating with understood that. That's behaviour that is not defined anywhere though so it depends on the server your using doing something incredibly bad and horribly wrong.
And @kampde thanks searching for prior issues and for not opening a new issue.
I don't believe so. No. That's for HTTP Headers, not for mime-headers
@kampde after a quick skim, that is the correct RFC. As you can see it is 18 years old.
I think in https://github.com/kennethreitz/requests/blob/master/requests/packages/urllib3/fields.py#L37
try:
result.encode('ascii')
except UnicodeEncodeError:
pass
else:
return result
Modify to "result.encode('utf8')" will be better ,because most server can handle with utf8, but many of them do not support the style of "email.utils.encode_rfc2231(value, 'utf-8')"
@zhangchunlin What does 'most servers' mean? Which servers? Which versions of those servers? Why don't they implement RFC 2231?
@zhangchunlin if those servers do not implement a standard that is 18 years old, I fail to see why we should be forced to violate the standard.
@Lukasa OK, I didn't test so much, my statement maybe wrong. I just found that the behavior of requests wasn't same as browser(for example chrome), what I thought is that the method chrome using is workable.
@sigmavirus24 I will try to make clear and submit issue to those server if needed.
It seems PHP is also affected by this, if you try to upload a file to a server running PHP, with the name 'fårikål.txt', it will throw a warning: "PHP Warning: File Upload Mime headers garbled in Unknown on line 0".
This is PHP 5.6.14.
@WishCow I'm not certain what result you expect to see if you're filing a PHP bug against another project. It seems frameworks in Perl, Ruby, and Python all appropriately support RFC 2231. If PHP 5.6.14 doesn't support an 18 year old standard, you should file a bug with PHP.
Just leaving a note here, in case other people encounter this issue, it took me a long time to find the cause.
@WishCow you'll probably have a better time putting together some minimal bit of PHP code and filing a bug with PHP. This comment will help others, but filing a bug to get this fixed in PHP would help a lot more people.
Actually I was about to do that, and I whipped up a quick example of the upload with curl, but that seems to work. Now I'm confused, is there another RFC that describes how filenames should be handled, that curl (and PHP) might be implementing?
So this:
curl -v -F får.txt=@/tmp/test.txt http://myserver.local
Does produce the correct output from the handling PHP script.
Run netcat locally and send the curl request to that.
Curl might be violating the RFC because support for the spec has lagged behind.
The command
curl -F får='@/tmp/test.txt;filename=får.txt' localhost:14511
Results in the netcat output:
POST / HTTP/1.1
Host: localhost:14511
User-Agent: curl/7.45.0
Accept: */*
Content-Length: 198
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------fb94c2e958ada9f0
--------------------------fb94c2e958ada9f0
Content-Disposition: form-data; name="får"; filename="får.txt"
Content-Type: text/plain
hello world
--------------------------fb94c2e958ada9f0--
So curl indeed does not seem to use the *=
format that the RFC is describing.
Yeah, so you can use httpie
to produce a cURL
like command that will probably trigger this for you.
You could also write some PHP that uses RFC 2231.
The SO post describes how to send files with the correct encoding, but I need to receive files, for which there doesn't seem to be a way, since the $_FILES superglobal gets populated before the userland script runs.
Thanks for the help though, in case someone else wants to track this in PHP: https://bugs.php.net/bug.php?id=70794
@WishCow right, that's what I meant (instead of using curl use PHP).
So I ran into this issue with a PHP server running Zend 1 and the solution that I came up with was to import urllib and then encode the filename like so files = {'file': (urllib.pathname2url(event.pathname), 'rb')}
and it solved the problem for me. Just adding this in case it might help someone else who runs into this.
That fix proved to introduce new problems because it changed the filenames in weird ways. I'm instead working on getting this PR in urllib3 to use HTML5 encoding vs. rfc2231 by default reopened. Hopefully this will allow this problem to be fixed for requests as well. I managed to rewrite the request I was using with my patched version of urllib3 based upon the currently closed PR and it worked.
This code:
requests.post(url, files={"file": open(u"漢字.o8d", "r")})
will return a 200, but the file is never uploaded.
I can upload that file by posting in the browser so this doesn't seem to be a server-side issue. Also, if I change the name of the file to "bob" or something ASCII it works perfectly.