Closed boris-petrov closed 1 year ago
From my reading of RFC 2045, which governs the shape of the parameters and values, the allowed tokens are US-ASCII characters: https://www.rfc-editor.org/rfc/rfc2045#section-5.1
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials>
On Fri, Feb 24, 2023 at 1:34 AM Boris Petrov @.***> wrote:
Guava 31.1-jre, JDK 19.
MediaType.parse("application/pdf; name*0=\"кирилица\"")
Leads to an exception:
java.lang.IllegalArgumentException: Could not parse 'application/pdf; name*0="кирилица"'
at com.google.common.net.MediaType.parse(MediaType.java:1078)
at ...
Caused by: java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:486)
at com.google.common.net.MediaType$Tokenizer.consumeToken(MediaType.java:1101)
at com.google.common.net.MediaType.parse(MediaType.java:1066)
... 2 more
I believe this is a valid mime-type declaration? I saw it in the wild coming from Java Mail.
— Reply to this email directly, view it on GitHub https://github.com/google/guava/issues/6339, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEKCYLODWKPRVMYRD3356LWZBP7TANCNFSM6AAAAAAVGSHGNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
But the quoted-string
content doesn't appear to be defined.
I suspect that the definition is pulled from https://www.rfc-editor.org/rfc/rfc822, which looks like it's restricted to ASCII.
I was editing my comment to say that. And yes, section 3.3 says its contents are CHAR
s, which are restricted to ASCII.
Now perhaps MediaType
should be more lenient. What's the intention there?
I don't remember anything specifically, but I'd guess that our general preference is to be strict if it's not causing problems and then to reevaluate when we hear about those problems.
I pulled probably a dozen MediaType
IllegalArgumentException
bugs up from a quick internal bug search, and none appeared to be about non-ASCII characters, only about other malformed inputs. Now, there were other such bugs that I didn't look at, but I didn't see immediate signs that this is common.
It's especially scary to let unusual characters into headers, since different tools may interpret them differently. I'd probably err on the side of leaving things as they are unless we find more evidence of problems, at which point we might try letting in specific defined ranges of additional characters.
I'm inclined to agree with you. Closing this as WAI, but if you have more data to suggest that this occurs frequently in the wild, @boris-petrov, please reopen.
Guava 31.1-jre, JDK 19.
Leads to an exception:
I believe this is a valid mime-type declaration? I saw it in the wild coming from Java Mail.