Closed jecuendet closed 5 years ago
@cezarykluczynski could please describe the problem that the PR solves?
Sorry for the late response. I'll try to look at it on the weekend.
Ok, I'm waiting for an update.
I wasn't been able to reproduce this. As far as my tests show, when there is no charset, the charset is set to UTF-8, and both request body and response body are not corrupted. @jecuendet Could you provide a more detailed example of what worked before for you, and stopped working after PR #63?
What happens in your case is that, for whatever reason, the default charset is UTF-8 (in our case it's ISO-8859-1) In fact it doesn't matter which default it is
The problem arise when the data is sent in a charset different than the default one (ISO or UTF) and Charon parse it with the default charset (since the charset is not sent with the payload)
In this case, the data is corrupted
Currently, when POST with Content-Type application/x-www-form-urlencoded is sent, it is always assumed that it should be encoded as UTF-8. No default system charset is used, at least not in relation to PR #63.
The method linked above should perhaps take into account the charset sent in Content-Type header, and at least recognize the few charset found in java.nio.charset.StandardCharsets. It should also keep fallback to UTF-8 when no charset is sent in header.
What's your opinion, @mkopylec?
That's exactly the problem we have. The request is encoded in ISO-8859-1 and the code above assumes that it's UTF-8, so it corrupts the data. We can not rely on the Content-Type in our case, since the charset is not set
What should be done is to NOT parse the body in this case. There is no way of knowing in which charset it's encoded. What I propose is a flag to disable the parsing, either globally, either per mapping.
Ok, I see what is the problem now.
Correct me if I'm wrong.
The problem is that if a request is a form post request then reading the body using getInputStream()
returns an empty stream. So creating a flag that disables reading the body from request params is not a valid solution because then the body will be empty and request params will be lost during the forwarding process.
If no charset is set in the Content-Type header there is no way to determine the encoding and I think we have to options then:
No irt's not exactly that. It's not a problem with getInputStream returning null. getInputStream returns the stream, that's fine But is this stream is encoded in ISO-8859-1 (which is the case of our application), then Charon decodes it to Parameters using default UTF-8 and so Parameters are corrupted (ISO decoded as UTF-8). If we use those Parameters, the values are completey garbage ... And there is no way of knowing which is the right charset to use since in this case, our application doen't set the Content-Type charset... (only www-for-urlencoded)
The only way of doing it right, is to NOT pare the body, and NOT decode it then re-encode it before forwarding the request.
This is the goal of this issue: permit to add a flag to NOT parse the body if wanted.
Ok, so if it behaves like @jecuendet says, then I don't understand why should we read request params in case of form data POST request instead of just relying on input stream. @cezarykluczynski ?
That's exactly the point of this issue! :-) Thanks IMO, it's useless to parse the body then re-encode it to forward the request. Reverting #63 would not be a regression I think.
@cezarykluczynski you can set charon's filter order to HIGHEST_PRECEDENCE
to prevent getting empty body if the request is a form POST request.
The problem here is that if charon's filter is configured with the default order then something is consuming the request stream earlier in the filter chain. But this happens only if it is a form POST request.
@mkopylec Version 3.0.1 works with HIGHEST_PRECEDENCE, does not work without it. If you think that's the best compromise here, I'm fine with that. Who should do a PR with the revert?
I’ll do it
Done in 3.2.0
Thanks, it's exactly the problem we had. In our case, the Charon filter priority is overriden to HIGHEST, that's why we had not the same result as you.
Now HIGHEST_PRECEDENCE
is the default Charon's filter order
In PR #63 you merged code to parse form-urlencoded body I don't understand why. Which issue did it solved? I can't find the one that was related to PR #63
This can be found in RequestDataExtractor.extractBody() and RequestDataExtractor.getBodyFromServletRequestParameters()
In our case, it doesn't work. We have some applications that send form-urlencoded body as UTF-8, don't put the charset in the content-type while sending. So Charon uses the default charset, ISO-8859-1, which corrupts the body.
What about adding a flag that disable the parsing if not wanted?