psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.02k stars 9.3k forks source link

Posting a mutlipart-encoded file with non-ASCII characters in filename doesn't work #3446

Closed ghost closed 8 years ago

ghost commented 8 years ago

The following code doesn't actually upload any data to the server:

r = requests.post('https://gs.smuglo.li/api/statusnet/media/upload',
    auth=('testbot', 'testbot'),
    files={'media': open('/tmp/Снимок экрана_2016-07-27_05-15-38.png', 'rb')})
Lukasa commented 8 years ago

What makes you think that doesn't upload data to the server?

Lukasa commented 8 years ago

Put another way: what is r.request.body?

ghost commented 8 years ago

@lukasa

What makes you think that doesn't upload data to the server?

The fact that the server returns an XML with following content:

<?xml version="1.0" encoding="UTF-8"?>
<rsp stat="fail">
 <err msg="There is no uploaded media for input field &quot;media&quot;."></err>
</rsp>

Put another way: what is r.request.body?

>>> r.request.body
b'--254dc93f44a24498bef41502bb23d76f\r\nContent-Disposition: form-data; name="media"; filename*=utf-8\'\'%D0%A1%D0%BD%D0%B8%D0%BC%D0%BE%D0%BA%20%D1%8D%D0%BA%D1%80%D0%B0%D0%BD%D0%B0_2016-07-2
7_05-15-38.png\r\n\r\n\x89PNG...

And a lot of hex values after that.

Lukasa commented 8 years ago

Yup, so that's why I asked about r.request.body. We are uploading data: you can see it in r.request.body. What's happening is that the server isn't reading it. This is almost certainly because the server doesn't support RFC 2231. See #2313.

You can probably fix this by using the extended syntax for file uploads with an appropriately created byte string:

files = {'media': (u'Снимок экрана_2016-07-27_05-15-38.png'.encode('utf-8'), open('/tmp/Снимок экрана_2016-07-27_05-15-38.png', 'rb'))}

ghost commented 8 years ago

@Lukasa Is this Python 2? Because in Python 3 I get

TypeError: a bytes-like object is required, not 'str'
Lukasa commented 8 years ago

You get that where? Encode should be forcing to bytes-like object. Can I see the proper traceback?

ghost commented 8 years ago

@Lukasa Sure

>>> filename = '/tmp/Снимок экрана_2016-07-27_05-15-38.png' 
>>> media = {'media':(filename.encode('utf-8'), open(filename, 'rb'))}
>>> r = requests.post(url, auth=('testbot', 'testbot'), files=media)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    r = requests.post(url, auth=(username, password), files=media)
  File "/usr/lib/python3.5/site-packages/requests/api.py", line 111, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/lib/python3.5/site-packages/requests/api.py", line 57, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.5/site-packages/requests/sessions.py", line 461, in request
    prep = self.prepare_request(req)
  File "/usr/lib/python3.5/site-packages/requests/sessions.py", line 394, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/lib/python3.5/site-packages/requests/models.py", line 298, in prepare
    self.prepare_body(data, files, json)
  File "/usr/lib/python3.5/site-packages/requests/models.py", line 449, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "/usr/lib/python3.5/site-packages/requests/models.py", line 155, in _encode_files
    rf.make_multipart(content_type=ft)
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 174, in make_multipart
    (('name', self._name), ('filename', self._filename))
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 134, in _render_parts
    parts.append(self._render_part(name, value))
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 114, in _render_part
    return format_header_param(name, value)
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 35, in format_header_param
    if not any(ch in value for ch in '"\\\r\n'):
  File "/usr/lib/python3.5/site-packages/urllib3/fields.py", line 35, in <genexpr>
    if not any(ch in value for ch in '"\\\r\n'):
TypeError: a bytes-like object is required, not 'str'
Lukasa commented 8 years ago

Ah, I see what's happening there.

So this starts to get really unpleasant. It seems like urllib3 gets mad when using a bytestring in this place on Python 3. Out of interest, try dropping the .encode from the filename?

Either way, the problem seems to be the use of RFC 2231 in this place. urllib3 is looking to make RFC 2231 encoding optional, so this problem should be resolvable in a future release.

ghost commented 8 years ago

@Lukasa It gets back to the original error.

<?xml version="1.0" encoding="UTF-8"?>
<rsp stat="fail">
 <err msg="There is no uploaded media for input field &quot;media&quot;."></err>
</rsp>
Lukasa commented 8 years ago

Yeah, so like I said, this is an RFC 2231 concern at this point. This represents a urllib3 problem, but it's one we've got an open PR for solving it: shazow/urllib3#856.

ghost commented 8 years ago

@Lukasa That's good to hear. I have a solution for this, since the filesize can't be more than 20MB for that server, I just do

>>> media = {'media': open(filename, 'rb').read()}
>>> r = requests.post(url, auth=('testbot', 'testbot'), files=media)
>>> print(r.text)
<?xml version="1.0" encoding="UTF-8"?>
<rsp stat="ok" xmlns:atom="http://www.w3.org/2005/Atom">
 <mediaid>84983</mediaid>
 <mediaurl>https://gs.smuglo.li/attachment/84983</mediaurl>
 <media_url>https://gs.smuglo.li/attachment/84983</media_url>
 <size>23815</size>
 <atom:link rel="enclosure" href="https://gs.smuglo.li/file/e1035ccd7c31b07edad49251a8ff2bd6bce96fbda7a2585cd113b137df187d8c.png" type="image/png"></atom:link>
 <media_id>84983</media_id>
 <media_id_string>84983</media_id_string>
 <image w="766" h="317" image_type="image/png"></image>
</rsp>
Lukasa commented 8 years ago

That's good! In that case, let's close this in favour of the open issues.

ghost commented 8 years ago

@Lukasa Be sure to hit me up when that RFC is properly implemented by urllib3, an I will remove that filthy hack I use.

eamirgh commented 5 years ago

any solutions to python3? i have the same problem here:

db = open('db.txt').read().splitlines() #db is written in utf-8
i = 0
for line in db: # line is './images/APPLE-IPHONE_7_PLUS-SILICON_BLACK.jpg'
    files = {'image': (line, open(line, 'rb')) }
    r = requests.post(url, files=files, auth=(username, password))
    if r.text != 'done!':
        print(r.request.body)
        errs.write((str(i) + ' ' + line + ' failed!' + '\n'))
    else:
        print(i, line)
    i = i + 1

i have problem when non ascii codes like 'Ş' occurs and print(r.request.body) becomes None