Closed homm closed 10 years ago
This is actually a bug in Django then. Using this syntax is what we're supposed to be using. We have to indicate to the server that we're sending a field whose content is not ASCII or Latin-1 (ISO-8859-1). The proper way to do so is the syntax you see there. It is defined in Section 3.2.1 of RFC 5987. This bug should be filed against Django instead for not conforming to the proper handling of that value. Edit Note specifically that parameter
is redefined as reg-parameter
or ext-parameter
where ext-parameter
is defined as parmname
(e.g., filename
) concatenated with a *
character followed by = and the ext-value
. This confirms that this is the proper handling of those header values. I believe this RFC also is applied to MIME header values which are what you're using in your multipart/form-data
upload.
I test php 5.5.9 server, and it also does not understand form-data with filename*
attribute as files. I think requests should be close to real world then to rfc, especially when compromise is possible. How about both attributes?
It's unclear what the compromise should be. What text encoding should we use?
Compromise is using both attributes. Encoding is not so important. It can be even filename=blah.bin
, this will be enough for server to recognize files.
Filename is hugely important. If we just make random choices, I guarantee that people will simply raise a new bug report claiming that we mangled their filenames.
If you don't like the way we do it, encode the filename yourself. =)
Feel the difference: "I can't even upload file to majority of servers" and "server does not understand file name".
There are two cases:
filename*
property. No problems.By the way, before 2.0 requests use some encoding, and as I know everything works well.
Resist the temptation to say "Understands unicode". That phrase is meaningless when it comes to networked connections. The server needs to understand the specific text encoding we've chosen, and there is no consistency here. We will get this wrong, and what happens in this situation is totally unclear. Some servers will decode the text and get lucky, and read the filename as something totally obtuse and unclear, and that would be terrible.
I would rather fail clearly and allow the user to make the choice than to guess and get it wrong. This is not PHP.
Compromise is using both attributes
How would using both attributes work? If you read the grammar I linked to, it updates the meaning to say that you can only send one, i.e., either filename=
or filename*=
. If you send both, you're sending the wrong thing.
By the way, before 2.0 requests use some encoding, and as I know everything works well.
It worked purely by chance that those characters could be encoded to bytes. We were just sending whatever the bytestring was (if you look at the string it is what you would get if you took your original string and called encode('<encoding>')
. You get the literal bytes and that's what we sent. You could potentially still encode it yourself and then pass that as the filename. You really should only be giving us bytes anyway.
Seeing as this is not a bug, and there is no good reason for us to change how we handle what is given us by the user, I'm closing this.
How would using both attributes work? If you read the grammar I linked to, it updates the meaning to say that you can only send one. If you send both, you're sending the wrong thing.
This is totally wrong.
This specification suggests that a
parameter using the extended syntax takes precedence. This would
allow producers to use both formats without breaking recipients that
do not understand the extended syntax yet.
Interesting. I had forgotten that we could send both. Regardless, we can't provide an ASCII representation for you because we can't guess at how to generate ASCII from what you give us. You would still have to generate the value yourself because there's no sane way to provide that functionality via an API.
When I try to encode filename by myself as requests 1.2.3 does, I've got a error:
>>> requests.post('http://ubuntu:8000/', files={'file': (u'файл.png'.encode('utf-8'), '123')})
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 10: ordinal not in range(128)
I can't use just ascii representation for filename
, because there is no way to send both filename
with ascii and encoded filename*
.
There's so much going on in that line of code I don't know where the breakage is. Can you split that up into multiple lines so I can see which action causes that traceback?
>>> filename = u'файл.png'.encode('utf-8')
>>> requests.post('http://ubuntu:8000/', files={
'file': (filename, '123')
})
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-5-42737369472a> in <module>()
----> 1 import requests; print requests.post('http://ubuntu:8000/', files={'file': (u'файл.png'.encode('utf-8'), '123')})
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/api.pyc in post(url, data, **kwargs)
86 """
87
---> 88 return request('post', url, data=data, **kwargs)
89
90
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/api.pyc in request(method, url, **kwargs)
42
43 session = sessions.Session()
---> 44 return session.request(method=method, url=url, **kwargs)
45
46
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert)
420 hooks = hooks,
421 )
--> 422 prep = self.prepare_request(req)
423
424 proxies = proxies or {}
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/sessions.pyc in prepare_request(self, request)
358 auth=merge_setting(auth, self.auth),
359 cookies=merged_cookies,
--> 360 hooks=merge_hooks(request.hooks, self.hooks),
361 )
362 return p
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/models.pyc in prepare(self, method, url, headers, files, data, params, auth, cookies, hooks)
295 self.prepare_headers(headers)
296 self.prepare_cookies(cookies)
--> 297 self.prepare_body(data, files)
298 self.prepare_auth(auth, url)
299 # Note that prepare_auth must be last to enable authentication schemes
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/models.pyc in prepare_body(self, data, files)
430 # Multi-part file uploads.
431 if files:
--> 432 (body, content_type) = self._encode_files(files, data)
433 else:
434 if data:
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/models.pyc in _encode_files(files, data)
147 rf = RequestField(name=k, data=fp.read(),
148 filename=fn, headers=fh)
--> 149 rf.make_multipart(content_type=ft)
150 new_fields.append(rf)
151
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.pyc in make_multipart(self, content_disposition, content_type, content_location)
173 """
174 self.headers['Content-Disposition'] = content_disposition or 'form-data'
--> 175 self.headers['Content-Disposition'] += '; '.join(['', self._render_parts((('name', self._name), ('filename', self._filename)))])
176 self.headers['Content-Type'] = content_type
177 self.headers['Content-Location'] = content_location
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.pyc in _render_parts(self, header_parts)
136 for name, value in iterable:
137 if value:
--> 138 parts.append(self._render_part(name, value))
139
140 return '; '.join(parts)
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.pyc in _render_part(self, name, value)
116 The value of the parameter, provided as a unicode string.
117 """
--> 118 return format_header_param(name, value)
119
120 def _render_parts(self, header_parts):
/home/homm/env/env_ucare/local/lib/python2.7/site-packages/requests/packages/urllib3/fields.pyc in format_header_param(name, value)
41 result = '%s="%s"' % (name, value)
42 try:
---> 43 result.encode('ascii')
44 except UnicodeEncodeError:
45 pass
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 10: ordinal not in range(128)
Hmm, that really looks like that should catch UnicodeDecodeError
s as well, to allow for exactly this case. Or, even more accurately, should check the type of the string, and not bother encoding bytestrings.
Recommend we open a bug report on urllib3 if @shazow agrees.
I have checked Tornado and Werkzeug, no one of them don't know about filename*
. Do you know any web server who reads rfc in same way as you?
Rails as well.
If you're doubtful whether that RFC applies to it, the same grammar is used in Section 4 of RFC 2231 which is more directly related to MIME headers and encoding non-ASCII values in them.
There's also a related discussion on the Django ticketing system that shows the team knows this is something they have to do but hasn't acted on (in over a year).
And for the record, it isn't just a rails issue, it is a rack problem (which does not have a corresponding issue, yet).
Until issues there are in majority of web servers, the issue is in requests.
@homm What is your proposal for requests? The best option I've seen so far is to send both filename
and filename*
in that order. If that happens, what should we put in filename
?
@homm you're inherently wrong. Rack claims to support RFC 2231, this is clearly a failure to properly do so given how clear 2231 (and 5987) are in their grammar. You claim earlier versions of requests do not cause the problem, the solution to your problem then is to clearly use old versions since you seem to think broken behaviour is correct.
@sigmavirus24 I am open to sending both filename
and filename*
, this should be supported, but I don't know how we'd populate filename
.
@Lukasa
1) whatever is enough for file recognition for not compliant server and does not brake others (because "specification suggests that a parameter using the extended syntax takes precedence").
2) it can be value without any non-ascii chars and "unknown" as fallback.
3) should be a way to send encoded filename
if I exactly know how server handles encoded values.
@sigmavirus24 I can't use old requests version because I need SNI on python 2 :(
The problem is that 1) and 2) are wildly ambiguous. Suppose you provide the string u"使用的是"
. What should our fallback name be? How can we determine it? What do we do when a user wants to provide that fallback name themselves?
@Lukasa ostensibly through the toolbelt since that's the only place where users have complete control over the fields but even so, this would probably need to be worked on in urllib3 because I'm fairly certain it doesn't currently provide a way to do this.
Further RFC 2231 is the authority on this since 5987 (as I've re-read it) is for HTTP Headers (not MIME/multipart headers) and 2231 does not allow for multiple parameters (on quick glance over it)
@homm
2) it can be value without any non-ascii chars and "unknown" as fallback.
This is not an accurate representation in the slightest. At best it will introduce utter confusion.
3) should be a way to send encoded filename if I exactly know how server handles encoded values.
You can never know exactly how any server handles these values unless you've written it from scratch yourself.
Let's be clear on our thinking here @homm. What we need is something that satisfies criteria 1) and 2) in an unambiguous way while providing a pleasant API. I contend that no such solution exists, because criteria 1) and 2) are ambiguous and resolving that ambiguity dirties up the API.
Requests does not need to be able to generate every valid HTTP request under the sun, only a subset of them. We've decided how we're approaching this, and we're backed up by the specs. We've also provided you with a workaround. That workaround reveals a bug in urllib3 that we're going to pursue fixing.
Unless you can design a pleasant, non-ambiguous API, we're not going to be too worried about this.
You can never know exactly how any server handles these values unless you've written it from scratch yourself.
I know how exactly servers work. Almost all of them exactly does not understand parameters placed in files
argument of requests.post
function.
Starting from 2.0 requests does not send
filename
attribute ofContent-Disposition
header for multipart files with unicode names. Instead of this attribute with namefilename*
is sent.And this is a big problem, because looks like some systems does not recognize such fields as files. At least we encountered a problem with Django. Django places entire file's content in
request.POST
instead ofrequest.FILES
. It is clear from sources: https://github.com/django/django/blob/1.7c1/django/http/multipartparser.py#L599-L601