google / guava

Google core libraries for Java
Apache License 2.0
50.19k stars 10.91k forks source link

Could not parse MediaType #6663

Closed asialjim closed 1 year ago

asialjim commented 1 year ago

When GuavaAPI parse Content-Type String to GuavaMediaType, there exception happen: Maven Dependency:

       <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>32.1.2-jre</version>
        </dependency>

Code:

        MediaType parse = MediaType.parse("multipart/form-data;charset=UTF-8;boundary =--jelsaflflksafjel");
        System.out.println(parse);

Exception:

/home/asialjim/.jdks/corretto-17.0.7/bin/java -javaagent:/home/asialjim/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/232.8660.185/lib/idea_rt.jar=43365:/home/asialjim/.local/share/JetBrains/Toolbox/apps/IDEA-U/ch-0/232.8660.185/bin -Dfile.encoding=UTF-8 -classpath /home/asialjim/IdeaProjects/test/guava-test/target/classes:/home/asialjim/.m2/repository/com/google/guava/guava/32.1.2-jre/guava-32.1.2-jre.jar:/home/asialjim/.m2/repository/com/google/guava/failureaccess/1.0.1/failureaccess-1.0.1.jar:/home/asialjim/.m2/repository/com/google/guava/listenablefuture/9999.0-empty-to-avoid-conflict-with-guava/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/home/asialjim/.m2/repository/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar:/home/asialjim/.m2/repository/org/checkerframework/checker-qual/3.33.0/checker-qual-3.33.0.jar:/home/asialjim/.m2/repository/com/google/errorprone/error_prone_annotations/2.18.0/error_prone_annotations-2.18.0.jar:/home/asialjim/.m2/repository/com/google/j2objc/j2objc-annotations/2.8/j2objc-annotations-2.8.jar io.github.asialjim.frame.remote.Main
Exception in thread "main" java.lang.IllegalArgumentException: Could not parse 'multipart/form-data;charset=UTF-8;boundary =--jelsaflflksafjel'
    at com.google.common.net.MediaType.parse(MediaType.java:1085)
    at io.github.asialjim.frame.remote.Main.main(Main.java:8)
Caused by: java.lang.IllegalStateException
    at com.google.common.base.Preconditions.checkState(Preconditions.java:496)
    at com.google.common.net.MediaType$Tokenizer.consumeCharacter(MediaType.java:1123)
    at com.google.common.net.MediaType.parse(MediaType.java:1063)
    ... 1 more

When I delete the space charactor between 'boundary' and '=---' like this;

        MediaType parse = MediaType.parse("multipart/form-data;charset=UTF-8;boundary=--jelsaflflksafjel");
        System.out.println(parse);

Then It's worked!

By the way, This Content-Type: 'multipart/form-data;charset=UTF-8;boundary =--jelsaflflksafjel' with space charactor was generated by trip framework

cgdecker commented 1 year ago

From what I can tell, that Content-Type is illegal according to RFC 2045, which MediaType conforms to. A property for a media type is of the form attribute=value and the attribute cannot contain a space under any circumstances. The value can only contain a space if it's quoted.

A space seems to be allowed between the ; and the parameter though, e.g. multipart/form-data; charset=UTF-8; boundary=--jelsaflflksafjel. That said, in the time I've been looking at the RFCs I haven't seen anything that obviously states that spaces are allowed there, making me wonder if the RFCs might technically allow spaces around any parts of the syntax (in which case you could also legally write multipart / form-data, and the space after boundary could also be considered skippable rather than an invalid part of the parameter's attribute.

One other thing: the class Javadoc for MediaType states:

Note that this specifically does not represent the value of the MIME Content-Type header and as such has no support for header-specific considerations such as line folding and comments.

So trying to directly parse the value of a Content-Type header to a MediaType may not be the right thing to do anyway unless the API returning the string is accounting for and removing things like comments and line folding.

cgdecker commented 1 year ago

Digging around a little more, it looks like according to RFC 822, whitespace should be allowed around the = in a parameter (as well as around the / between the type and subtype, technically at least), though I'm still not 100% clear on that.

asialjim commented 1 year ago

Indeed, according to RFC 2045, that content type is illegal, but when I tried to parse it using the corresponding API of the Spring web framework, it also worked properly

        org.springframework.http.MediaType mediaType = org.springframework.http.MediaType.valueOf("multipart/form-data;charset=UTF-8;boundary =--jelsaflflksafjel");
        System.out.println(mediaType.getCharset());
        System.out.println(mediaType);
        mediaType = org.springframework.http.MediaType.valueOf("multipart/form-data; charset = UTF-8;boundary =--jelsaflflksafjel");
        System.out.println(mediaType.getCharset());
        System.out.println(mediaType);

Console Output

UTF-8
multipart/form-data;charset=UTF-8;boundary=--jelsaflflksafjel
UTF-8
multipart/form-data;charset=UTF-8;boundary=--jelsaflflksafjel

So I believe that if we can handle this defect, it will be more robust

cgdecker commented 1 year ago

As I mentioned in my second comment, I think my initial read was wrong and it actually is legal according to RFC 2045. I'm thinking it should be fine to allow whitespace before and after any of the separators, /, ;, and =.

asialjim commented 1 year ago

Yes, it is.