sookasa / box.py

Python client for Box
43 stars 25 forks source link

There seems to be an issue uploading files with non-ascii character in their names #30

Open k8n opened 10 years ago

k8n commented 10 years ago

Here is a trace (upload_file executed via box.py) that fails with "status":400,"code":"invalid_request_parameters":

POST /api/2.0/files/content HTTP/1.1
Host: upload.box.com
Content-Length: 47237
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.2.1 CPython/2.7.5 Darwin/13.1.0
Content-Type: multipart/form-data; boundary=c5a8408bda51429ba5871377731fc678
Authorization: Bearer ACCESS_TOKEN

--c5a8408bda51429ba5871377731fc678
Content-Disposition: form-data; name="parent_id"

1673257278
--c5a8408bda51429ba5871377731fc678
Content-Disposition: form-data; name*=utf-8''S%C3%A9vigny.pdf; filename*=utf-8''S%C3%A9vigny.pdf

Here is a trace from an equivalent openration via curl:

0000: POST /api/2.0/files/content HTTP/1.1
0026: User-Agent: curl/7.30.0
003f: Host: upload.box.com
0055: Accept: */*
0062: Authorization: Bearer ACCESS_TOKEN
009a: Content-Length: 43177
00b1: Expect: 100-continue
00c7: Content-Type: multipart/form-data; boundary=--------------------
0107: --------37edc5e8d7c9
011d: 
<= Recv header, 23 bytes (0x17)
0000: HTTP/1.1 100 Continue
=> Send data, 161 bytes (0xa1)
0000: ------------------------------37edc5e8d7c9
002c: Content-Disposition: form-data; name="filename"; filename="Se..v
006c: igny.pdf"
0077: Content-Type: application/octet-stream
009f: 
=> Send data, 16384 bytes (0x4000)

Seems Box Content API is not liking RFC 2231-encoded file names. More related discussion on this. And a hack-around?

I'm not sure yet what to tell requests library to send filename unencoded.

k8n commented 10 years ago

And here is a "monkey patch" for requests' field encoder. Say, mod_requests.py ...

#!/usr/bin/env python
# coding: utf-8

__author__ = 'k8n'

import requests
import email.utils
from requests.packages.urllib3.packages import six

def mod_format_header_param(name, value):
    """
    Helper function to format and quote a single header parameter.

    See original doc, minus rfc2331 for non-multiline strings.
    """
    if not any(ch in value for ch in '"\\\r\n'):
        result = '%s="%s"' % (name, value)
        try:
            result.encode('ascii')
        except UnicodeEncodeError:
            pass
        finally:
            return result
    if not six.PY3:  # Python 2:
        value = value.encode('utf-8')
    value = email.utils.encode_rfc2231(value, 'utf-8')
    value = '%s*=%s' % (name, value)
    return value

#monkey patching
requests.packages.urllib3.fields.format_header_param = mod_format_header_param
tals commented 10 years ago

Hey there,

I've sent an email to Box's api support. Let's see what they say :)