Closed spring-projects-issues closed 7 years ago
Juergen Hoeller commented
As far as I see, StandardMultipartHttpServletRequest
checks for both filename=
and filename*=
. Could you please double-check why it isn't parsing correctly in your case?
Miroslav Holubec commented
StandardMultipartHttpServletRequest does extractFilenameWithCharset(String) only in case that extractFilename(String) returns null (means no filename=
was present). Also I'm not sure if extractFilenameWithCharset(String) can parse this encoding (RFC-6266 seems to use different encoding without question marks).
Miroslav Holubec commented
Finally found required RFCs. So it seems to be that FormHttpMessageConverter
uses RFC-2047 encoding, which is unsupported by StandardMultipartHttpServletRequest
(that supports only RFC-6266).
Miroslav Holubec commented
possible workaround using commons-fileupload
Rossen Stoyanchev commented
In 4.1.1 #16724 added support for writing encoded file names to FormHttpMessageConverter
based on RFC-2047 which RFC-6266 says should not be used vs RFC-5987 which defines encoding for header field parameters specifically in HTTP. The second issue is that "filename*" should be used instead of "filename" as per RFC-6266, section 4.3.
In 4.2.1 #17904 added support for "filename" in StandardMultipartHttpServletRequest
based on RFC 2231 which is compatible with RFC-5987 but defining a subset of it for use in HTTP. We should however make an improvement to check for "filename" first as recommended in RFC-6266 section 4.3.
In 4.3 #19115 added support for RFC-5987 encoding to HttpHeaders. So it looks like FormHttpMessageConverter could now be updated to use the setContentDisposition(String, String, charset)
and that should produce a "filename*" attribute with RFC-5987 encoding. Hopefully commons-fileupload supports that, we'll have to check.
In 5.0 #18979 extracted content-disposition from HttpHeaders
handling into a separate ContentDisposition
type also moving along RFC-5987 related encoding and further adding support for decoding. Currently ContentDisposition
refers to RFC-2183 (updated by RFC 2231 and hence compatible with RFC-5987) but should really be based on RFC-6266. Also RFC-5987 related encoding and decoding should probably move back into HttpHeaders since it applies more generally to any header parameters.
In 5.0 at least we can also have StandardMultipartHttpServletRequest
delegate to ContentDisposition
for the parsing.
Rossen Stoyanchev commented
This was also recently discussed under #19072. Specifically this from RFC-7578 section 4.3:
In most multipart types, the MIME header fields in each part are
restricted to US-ASCII; for compatibility with those systems, file
names normally visible to users MAY be encoded using the percent-
encoding method in Section 2, following how a "file:" URI
[URI-SCHEME] might be encoded.
NOTE: The encoding method described in [RFC5987], which would add a
"filename*" parameter to the Content-Disposition header field, MUST
NOT be used.
So the most recent advice seems to be that you're supposed to encode file names like URIs but on the server side at least we can't avoid processing "filename*" as per #10655 it seems to be. And also continue to support it in FormHttpMessageConverter through the multipartCharset property.
Sébastien Deleuze, Brian Clozel what's your take on all this?
Miroslav Holubec commented
Hi Rossen, I think you are wrong about saying:
...based on RFC-2047 which RFC-6266 says should not be used. RFC-6266 states clearly: This document does not apply to Content-Disposition header fields appearing in payload bodies transmitted over HTTP, such as when using the media type "multipart/form-data" ([RFC2388]).. Anyway, receiving side should ideally support all possible variants and by my recent opinion, RFC-2047 on sender side should be fine, as it follows MIME standard.
Rossen Stoyanchev commented
Yes you're right that RFC-6266 explicitly excludes the multipart/form-data cases. The RFC-2388 it refers to is obsoleted by RFC-7578 which explains the quote from it, in my last comment.
On the receiving side we should support all options indeed. The question is about the sending side, i.e. in FormHttpMessageConverter
and also in ContentDisposition
/ HttpHeaders
. The former is RFC-2047, the latter RFC-5987. Or maybe the key is for ContentDisposition
to separate out the case of multipart/form-data from other content-disposition types.
We'll have to experiment a bit with browser support too. Here is one article from Mozilla that seems to favor RFC-5987.
Rossen Stoyanchev commented
Looking around at a few Java libraries and servers:
Rossen Stoyanchev commented
A quick experiment with browsers. Firefox and Chrome encode the filename based on the charset of the form (i.e. accept-charset form attribute). This is consistent with what the HTML5 spec says. Apache Commons FileUpload however does not decode that correctly.
Miroslav Holubec commented
Yes, you are right. RFC-7578 seems to be valid document for implementation here. Unfortunately URI encoding is currently not widely supported on receiving sides. Because of that I would vote to support RFC-2047 encoding on sending site, at least until URI encoding support is implemented in commons-fileupload.
About receiving side, as far as I understood we should support RFC-2047, percent encoding and directly encoded header (in non ASCII charset). Support for filename*=
can be dropped as RFC-7578 clearly forbids it.
Juergen Hoeller commented
Let's turn this into a 5.0 topic then, possibly picking up a Commons FileUpload update as well.
Rossen Stoyanchev commented
I've made one change already in FormHttpMessageConverter for the client side to encode part headers directly to the configured charset (UTF-8 by default), also setting the charset of the "Content-Type" header so the server knows how to decode, which is what Apache Commons FileUpload seems to do. The RFC 2047 filename encoding is still there but used only if you set the separate "multipartCharset" property.
I'm moving the ticket to RC2 to complete the server side work. My plan for the server side is to support direct encoding (based on the charset of the "Content-Type" header). I would also drop the RFC 5987 encoding that we currently support since RFC-7578 forbids it. I hesitate to support RFC 2047 mainly because it requires javax.mail as a dependency, which is less than ideal for this reason alone, and because I'm hoping direct encoding goes far enough to meet the need for encoded filenames. As for URI encoding I'm still not sure how would the server detect that is used, would it check for a scheme prefix for example?
Rossen Stoyanchev commented
StandardMultipartHttpServletRequest now supports both RFC 5987 and RFC 2047 filename encoding although the latter works only if javax.mail is present.
Juergen Hoeller commented
rstoyanchev, it just occurred to me that Servlet 3.1 has a getSubmittedFileName()
method on Part
: Since Spring Framework 5.0 is Servlet 3.1+ anyway, shouldn't we simply be delegating to that method? Or at least have an option to delegate to that?
Rossen Stoyanchev commented
Good point.
The Javadoc (and spec) are scant on information about what the method should do.
The Tomcat implementation seems to respect RFC-6266 which we already discussed above shouldn't be used. The Jetty implementation doesn't support any decoding, just unquotes.
We would probably have to have an option to delegate to that, if required, to call it as an alternative to what we have now which supports RFC 5987 and 2047.
Miroslav Holubec opened SPR-15205 and commented
There is wrong behavior when encoding/decoding
Content-Disposition
header. When I do setMultipartCharset() on AllEncompassingFormHttpMessageConverter, it emits header like this:StandardMultipartHttpServletRequest
cannot decode such a name, as it expects that "filename" is followed by asterisk.I'm a bit lost in RFCs, as not sure if RFC 6266 should be applied here (means asterisk required) or not.
Affects: 4.3.6
Issue Links:
19959 Possibility to configure encoding other than ascii in FormHttpMessageConverter.MultipartHttpOutputMessage ("is duplicated by")
16724 FormHttpMessageConverter writes ASCII encoded so that a multipart form data can not contain filenames with German Umlaute
17904 File upload with Servlet 3 multipart requests should support filename with charset encoding (RFC 2231)
18979 Introduce HttpHeaders getContentDisposition() and setContentDisposition()
19115 HttpHeaders.setContentDispositionFormData() doesn't encode non-acsii characters correctly
20097 Unable to emulate upload filename browser behaviour with RestTemplate
Referenced from: commits https://github.com/spring-projects/spring-framework/commit/390bb871d85f56aaeed98aeb45d688fcf35af8e7, https://github.com/spring-projects/spring-framework/commit/bb684ce60b82bb2d3a81828a6780800c51bb4e33, https://github.com/spring-projects/spring-framework/commit/75117f42b827397890060d2297b79bbc1b0c528c, https://github.com/spring-projects/spring-framework/commit/f89511e7fe9c32ad28de2c00a6197ca9ad6ce937, https://github.com/spring-projects/spring-framework/commit/3009e29489567f69d4f458c13eaaf2dbbb38f477