Why parse from body - Githubissues

jecuendet commented 5 years ago

In PR #63 you merged code to parse form-urlencoded body I don't understand why. Which issue did it solved? I can't find the one that was related to PR #63

This can be found in RequestDataExtractor.extractBody() and RequestDataExtractor.getBodyFromServletRequestParameters()

In our case, it doesn't work. We have some applications that send form-urlencoded body as UTF-8, don't put the charset in the content-type while sending. So Charon uses the default charset, ISO-8859-1, which corrupts the body.

What about adding a flag that disable the parsing if not wanted?

mkopylec commented 5 years ago

@cezarykluczynski could please describe the problem that the PR solves?

cezarykluczynski commented 5 years ago

Sorry for the late response. I'll try to look at it on the weekend.

mkopylec commented 5 years ago

Ok, I'm waiting for an update.

cezarykluczynski commented 5 years ago

I wasn't been able to reproduce this. As far as my tests show, when there is no charset, the charset is set to UTF-8, and both request body and response body are not corrupted. @jecuendet Could you provide a more detailed example of what worked before for you, and stopped working after PR #63?

jecuendet commented 5 years ago

What happens in your case is that, for whatever reason, the default charset is UTF-8 (in our case it's ISO-8859-1) In fact it doesn't matter which default it is

The problem arise when the data is sent in a charset different than the default one (ISO or UTF) and Charon parse it with the default charset (since the charset is not sent with the payload)
In this case, the data is corrupted

cezarykluczynski commented 5 years ago

https://github.com/mkopylec/charon-spring-boot-starter/blob/41f72891a7592ad542b2d24a39af17162b9f7b21/src/main/java/com/github/mkopylec/charon/core/http/RequestDataExtractor.java#L79

Currently, when POST with Content-Type application/x-www-form-urlencoded is sent, it is always assumed that it should be encoded as UTF-8. No default system charset is used, at least not in relation to PR #63.

The method linked above should perhaps take into account the charset sent in Content-Type header, and at least recognize the few charset found in java.nio.charset.StandardCharsets. It should also keep fallback to UTF-8 when no charset is sent in header.

What's your opinion, @mkopylec?

jecuendet commented 5 years ago

That's exactly the problem we have. The request is encoded in ISO-8859-1 and the code above assumes that it's UTF-8, so it corrupts the data. We can not rely on the Content-Type in our case, since the charset is not set

What should be done is to NOT parse the body in this case. There is no way of knowing in which charset it's encoded. What I propose is a flag to disable the parsing, either globally, either per mapping.

mkopylec commented 5 years ago

Ok, I see what is the problem now. Correct me if I'm wrong. The problem is that if a request is a form post request then reading the body using getInputStream() returns an empty stream. So creating a flag that disables reading the body from request params is not a valid solution because then the body will be empty and request params will be lost during the forwarding process. If no charset is set in the Content-Type header there is no way to determine the encoding and I think we have to options then:

fallback to default charset (UTF-8 or a configurable one)
try to read POST params as raw bytes stream somehow (I don't even know if this is possible)

jecuendet commented 5 years ago

No irt's not exactly that. It's not a problem with getInputStream returning null. getInputStream returns the stream, that's fine But is this stream is encoded in ISO-8859-1 (which is the case of our application), then Charon decodes it to Parameters using default UTF-8 and so Parameters are corrupted (ISO decoded as UTF-8). If we use those Parameters, the values are completey garbage ... And there is no way of knowing which is the right charset to use since in this case, our application doen't set the Content-Type charset... (only www-for-urlencoded)

The only way of doing it right, is to NOT pare the body, and NOT decode it then re-encode it before forwarding the request.

This is the goal of this issue: permit to add a flag to NOT parse the body if wanted.

mkopylec commented 5 years ago

Ok, so if it behaves like @jecuendet says, then I don't understand why should we read request params in case of form data POST request instead of just relying on input stream. @cezarykluczynski ?

jecuendet commented 5 years ago

That's exactly the point of this issue! :-) Thanks IMO, it's useless to parse the body then re-encode it to forward the request. Reverting #63 would not be a regression I think.

mkopylec commented 5 years ago

@cezarykluczynski you can set charon's filter order to HIGHEST_PRECEDENCE to prevent getting empty body if the request is a form POST request. The problem here is that if charon's filter is configured with the default order then something is consuming the request stream earlier in the filter chain. But this happens only if it is a form POST request.

cezarykluczynski commented 5 years ago

@mkopylec Version 3.0.1 works with HIGHEST_PRECEDENCE, does not work without it. If you think that's the best compromise here, I'm fine with that. Who should do a PR with the revert?

mkopylec commented 5 years ago

I’ll do it

mkopylec commented 5 years ago

Done in 3.2.0

jecuendet commented 5 years ago

Thanks, it's exactly the problem we had. In our case, the Charon filter priority is overriden to HIGHEST, that's why we had not the same result as you.

mkopylec commented 5 years ago

Now HIGHEST_PRECEDENCE is the default Charon's filter order

mkopylec / charon-spring-boot-starter

Why parse from body #76