quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.82k stars 2.69k forks source link

UTF-8 encoding problem with MultiPart/RestEasy #10323

Closed Kondamon closed 4 years ago

Kondamon commented 4 years ago

Describe the bug I develop a rest API with MULTIPART_FORM. But I'm having problems with @Consumes charset. I sent some params to the controller which includes German characters. And it didn't work as expected. The characters are not encoded properly, what can also be seen in debugging the properties in VSCode.

Some of the characters that cause the problem:

Instead, I receive:

Expected behavior

To Reproduce Using httpie:

http -f POST localhost:8080/general content="Test-Ä" file@testFile.png

Response:

HTTP/1.1 200 OK
Content-Length: 11
Content-Type: text/plain;charset=UTF-8

Test-��

Ressource:

import org.jboss.resteasy.annotations.providers.multipart.MultipartForm;
import javax.ws.rs.Consumes;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;

public class FeedbackResource {

    @POST
    @Path("/general")
    @Produces(MediaType.TEXT_PLAIN)
    @Consumes(MediaType.MULTIPART_FORM_DATA+";charset=UTF-8")
    public String postForm(@MultipartForm final FeedbackBody feedback) {
        return feedback.content;
    }
}

Model:

package org.acme;

import org.jboss.resteasy.annotations.providers.multipart.PartType;

import javax.ws.rs.FormParam;
import javax.ws.rs.core.MediaType;

public class FeedbackBody {

    private byte[] file;
    public byte[] getFile() {
        return file;
    }

    @FormParam("file")
    @PartType(MediaType.APPLICATION_OCTET_STREAM)
    public void setFile(byte[] file) {
        this.file = file;
    }

    @FormParam("fileName")
    @PartType(MediaType.TEXT_PLAIN)
    public String fileName;

    @FormParam("content")
    @PartType(MediaType.TEXT_PLAIN+";charset=UTF-8")
    public String content;
}

Configuration

# Add your application.properties here, if applicable.
<properties>
    <compiler-plugin.version>3.8.1</compiler-plugin.version>
    <maven.compiler.parameters>true</maven.compiler.parameters>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <quarkus-plugin.version>1.5.1.Final</quarkus-plugin.version>
    <quarkus.platform.artifact-id>quarkus-universe-bom</quarkus.platform.artifact-id>
    <quarkus.platform.group-id>io.quarkus</quarkus.platform.group-id>
    <quarkus.platform.version>1.5.1.Final</quarkus.platform.version>
    <surefire-plugin.version>2.22.1</surefire-plugin.version>
  </properties>

Environment (please complete the following information):

ejba commented 4 years ago

Hi @Kondamon,

I created a reproducer according to your specifications and you are right about the behavior. Nevertheless, the HTTPie does not specify the Content-Type for the content field. According to Resteasy's documentation when the content's type is not specified it will use the default, us-ascii. Using the debugger, it seems to be the origin's problem (but does not exclude anyone see it by yourself). However, I tried different ways to override this default but without success.

Is it possible to enable the resteasy.add.charset? It would solve the problem.

Oops, it's enabled by default.

ejba commented 4 years ago

Including screenshots to understand better what I am trying to explain.

extract-part add-part-list
Kondamon commented 4 years ago

Hi @ejba! Thank you for your effort! I have tried to use Postman to send the form with these settings but without success. Still the same behavior. Here the content-type is provided as UTF-8.

Screenshot 2020-06-29 at 21 12 47

ejba commented 4 years ago

I tried again but with curl command instead. Successfully obtained the result that you expected.

~ % curl -v -X POST -F "content=Test-Ä;type=text/plain;charset=utf-8" "http://localhost:8080/hello"
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /hello HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.64.1
> Accept: */*
> Content-Length: 189
> Content-Type: multipart/form-data; boundary=------------------------bed3af2091f63039
> 
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Content-Length: 7
< Content-Type: text/plain;charset=UTF-8
< 
* Connection #0 to host localhost left intact
Test-Ä* Closing connection 0

So it means this is not a bug, it's just a matter of how to specify the content's type.

Kondamon commented 4 years ago

Thank you very much! It seems like only curl supports to specify the content type. I have added the mime-type="type=text/plain;charset=utf-8" in the form upload and everything works fine now!

batraz90 commented 4 years ago

Thank you very much! It seems like only curl supports to specify the content type. I have added the mime-type="type=text/plain;charset=utf-8" in the form upload and everything works fine now!

I can't see how you've solved this issue. öööööaaa always returns �����aaa with multipart.

@Path("/multi")
@Consumes(MediaType.MULTIPART_FORM_DATA)
public class MultiResource {

    @POST
    @Produces(MediaType.TEXT_PLAIN)
    public String message(@MultipartForm MultiBody multiBody) {
        return multiBody.message;
    }
}
public class MultiBody {

    @FormParam("message")
    @PartType(MediaType.TEXT_PLAIN)
    public String message;
}
POST /multi HTTP/1.1
Host: localhost:8080
Accept: */*
Accept-Language: de,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: multipart/form-data; boundary=---------------------------197791880114771219241910375257
Content-Length: 186
DNT: 1
Connection: keep-alive
Accept-Charset: utf-8
Pragma: no-cache
Cache-Control: no-cache

Response:

Content-Length: 53
Content-Type: text/plain;charset=UTF-8
����������������AAAAA
ejba commented 4 years ago

@batraz90 Could you show us how are you performing the request?

batraz90 commented 4 years ago

@batraz90 Could you show us how are you performing the request?

Actually it's a simple HTML form:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
</head>
<body>
    <form action="http://localhost:8080/multi" enctype="multipart/form-data" method="POST">
        <input type="text" name="message">
        <button type="submit"></button>
    </form>
</body>
</html>

but I haven't been able to make it work with Curl or any other Rest Client either.

Kondamon commented 4 years ago

I didn't use a html form. But you need to specify the content-type of your message input. I guess, it could be done with this code <form action="http://localhost:8080/multi" enctype="multipart/form-data" method="POST" accept-charset="utf-8">.

batraz90 commented 4 years ago

I didn't use a html form. But you need to specify the content-type of your message input. I guess, it could be done with this code <form action="http://localhost:8080/multi" enctype="multipart/form-data" method="POST" accept-charset="utf-8">.

I've tried that. Doesn't work.

ejba commented 4 years ago

I tried two ways to specify a charset in a HTML form and none worked out. Here's a reproducer.

Kondamon commented 4 years ago

I have used a framework that uses RFC2388 and RFC2045. Each bodyPart of the MultipartForm has it's own header. For setting UTF-8 just for the content field from above it has to look like this (see bodyParts[2]):

Screenshot 2020-07-01 at 23 01 12

batraz90 commented 4 years ago

Which means it doesn't work with a regular multipart/form-data form ?

ejba commented 4 years ago

@batraz90 the problem is the browser don't set the content-type for each body part like @Kondamon did with Alamofire lib. There's a thread in the quarkus-dev ML to discuss the possibility to allow configure default charset when not specify or have a better default charset as UTF-8.

evialle commented 4 years ago

I solved that issue by adding an Interceptor in my code

package fr.vstudios.leclick.front;

import org.jboss.resteasy.plugins.providers.multipart.InputPart;

import javax.ws.rs.container.ContainerRequestContext;
import javax.ws.rs.container.ContainerRequestFilter;
import javax.ws.rs.ext.Provider;

@Provider
public class CharsetInterceptorFilter implements ContainerRequestFilter {

    @Override
    public void filter(ContainerRequestContext context) {
        context.setProperty(InputPart.DEFAULT_CHARSET_PROPERTY, "UTF-8");
    }
}
ejba commented 4 years ago

I was trying to do that but in the vertx extension by instance. My idea was to allow configure the resteasy DEFAULT_CHARSET_PROPERTY property through a quarkus config property instead, avoiding create a filter for the purpose.

@gsmet Do you think it is possible doing that?

gsmet commented 4 years ago

I think we could hardcode it to UTF-8 and see if people want to make it configurable. UTF-8 is certainly a better default than the current situation.

gsmet commented 4 years ago

We just merged to master a new quarkus-resteasy-multipart extension that fixes this issue. You will just need to use it instead of the provider dependency.

Default charset will be UTF-8 but you can tweak it if needed.

I will backport this to 1.8.1.Final, that I should release on September 30th.

danielFesenmeyer commented 1 year ago

Seems to be still an issue with version 2.15.0.Final. Had to use the CharsetInterceptorFilter mentioned above.