spring-projects / spring-framework

Spring Framework
https://spring.io/projects/spring-framework
Apache License 2.0
56.79k stars 38.17k forks source link

ContentDisposition should allow encoding filename and filename* with different strategies and encodings #31940

Open patrickhuy opened 11 months ago

patrickhuy commented 11 months ago

Affects: 6.1.2

The class org.springframework.http.ContentDisposition should allow setting different encodings for the filename and filename* part. On MDN we learn that the filename part is for compatibility with user agents that don't support "complex" encodings.

We have observed that when setting the encoding to UTF-8 the filenames are parsed correctly by browsers (likely from filename*) but other clients, for example curl -OJ don't decode the UTF-8 encoded filename part and instead write a file with a name like =?UTF-8?Q?myFile.txt?= which is not a nice filename

Example:

    @Test
    public void testContentDispositionUTF8() {
        var disposition = ContentDisposition.builder("attachment")
                .filename("myFile.txt", StandardCharsets.UTF_8)
                .build();
        assertEquals("attachment; filename=\"=?UTF-8?Q?myFile.txt?=\"; filename*=UTF-8''myFile.txt",
                disposition.toString());
    }

It would be great if it was possible to specify that filename* should be encoded as UTF-8 and filename should be encoded in a "ascii" safe way (for example discard all characters > 255)

When choosing the ASCII charset and using ContentDisposition outside the ascii range Tomcat will error. For example:

            response.setHeader(HttpHeaders.CONTENT_DISPOSITION, ContentDisposition.builder("attachment").filename("。.txt").build().toString());

Then Tomcat would eventually tell us

java.lang.IllegalArgumentException: The Unicode character [。] at code point [12,290] cannot be encoded as it is outside the permitted range of 0 to 255
    at org.apache.tomcat.util.buf.MessageBytes.toBytesSimple(MessageBytes.java:310) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.tomcat.util.buf.MessageBytes.toBytes(MessageBytes.java:283) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.http11.Http11OutputBuffer.write(Http11OutputBuffer.java:389) ~[tomcat-embed-core-10.1.16.jar:10.1.16]
    at org.apache.coyote.http11.Http11OutputBuffer.sendHeader(Http11OutputBuffer.java:368) ~[tomcat-embed-core-10.1.16.jar:10.1.16]

Due to this issue it is not sufficient to allow filename to be encoded as ASCII while filename* would be encoded as UTF-8, instead we need a way to strip non-safe characters from filename (or encode it in a better supported format?)

deg-hrisser commented 3 days ago

Hey, I have also stumbled upon the problem - not all User Agents support the currently used encoding, which caused a great deal of confusion. In our case, it was a version of the Postman HTTP-Client.

It also explicitly states in Section 5 of RFC2047:

+ An 'encoded-word' MUST NOT be used in parameter of a MIME
 Content-Type or Content-Disposition field, or in any structured
 field body except within a 'comment' or 'phrase'.

In conjunction with RFC 6266, Appendix C.1, which reiterates:

RFC 2047 defines an encoding mechanism for header fields, but this
encoding is not supposed to be used for header field parameters --
see Section 5 of RFC2047.

...

In practice, some user agents implement the encoding, some do not
(exposing the encoded string to the user), and some get confused by
it.

It seems to me, that the current strategy does not honor these standards, although I fully admit I'm not very knowledgeable about these in particular.

I see this is still generally planned? Is there any workaround short of copying / modifying the Content-Disposition code into our own project?

Thanks for your work :)