Post Request Body must be read

bdurrer commented 2 years ago

Describe the bug It appears that C# HttpClient throws an exception when EmbedIO answers to an POST request without reading the sent request body first.

To Reproduce Steps to reproduce the behavior:

Take a look at the POC: https://github.com/bdurrer/embedio_post_test
Start the server, then run the client
the 4th request fails
See error

Expected behavior Maybe EmbedIo has to read the body if the user's code did not do that specifically. Note that HttpContext.GetRequestBodyAsStringAsync(); and its friends can be called even on requests that cannot have a body (e.g. GET), so a generic solution for all verbs should be possible

Setup:

OS: Windows
C# HttpClient on .netFramework 4.7.2
EmbedIO 3.4.3

rdeago commented 2 years ago

Hello @bdurrer, nice to see you opened an issue here after we discussed this on Slack.

It makes sense (kind of) that HttpClient wants the request body to be read: for the connection to be reused for another request, it must obviously be flushed in both directions.

On Slack you also mentioned how Postman is not affected by this problem. I think this is because Postman sends a single request and does not care about reusing the connection.

Although a ReadAndDiscardRequestBody() method would work as a workaround, the real solution is automatically reading the request body (if not already read) before sending response headers.

Better yet, we could read the request body before even trying to route the request; that's what PHP does IIRC. Obviously PHP has a configurable limit on request body size, which we could implement too. PHP also automatically parses uploaded files (I think they are exempt from the body size limit btw) and stores them in a temporary directory; that, too, would be nice to have, and maybe even necessary.

The problem lies in the "before sending response headers" part, as we don't know when exactly HttpListener will send them, unless we block all direct access to HttpListener methods. This is obviously a breaking change.

To recap, there are two possible solutions:

Always read the request body before processing the request; make the data available through the HTTP context object; have a configurable size limit for request bodies; parse and save uploaded files as they don't count against the limit; (plus probably more stuff I'm not considering at the moment);
Read the request body on demand; don't make the listener's Request and Response objects available to applications, ever, in any form, so as to be in control of when response headers are sent; throw an exception if request body contents are requested after being read and discarded; etc.

I personally lean on the side of 1: more complicated but it looks more like a solution as opposed to a convoluted workaround.

Thoughts, anyone? @geoperez @mariodivece @radioegor146 @madnik7 @rocketraman @bufferUnderrun

radioegor146 commented 2 years ago

My personal opinion is that we should not read the whole body. In the case of a big amount of requests with the handler not reading the body (for example returning 403 based only on cookies) will consume large amounts of memory and CPU time. If I understood correctly - the library is posing itself as an "embedded solution", and I've used it for some IoT devices. Also, not reading the full body is a great solution to attacks, as we will consume any excess CPU/RAM only if required.

geoperez commented 2 years ago

Hello, I agreed with @radioegor146. Access to I/O should be minimal to avoid performance issues.

rdeago commented 2 years ago

@radioegor146 @geoperez thanks a lot for chiming in.

Reading on demand is in fact simpler, and not reading at all is even simpler than that 😄 but the question is: what do we do if there is a request body and no handler reads it?

Some quick proposals:

Leave everything as is and mark this edge case as a probable bug in HttpClient.
After the request is handled, check the request's Content-Length header (good luck for chunked requests, but this is an edge case after all) and if there appears to be data that has not been read, and the connection is bound to be reused, read and discard the request body. Add an option to WebServerBase for the size limit of request bodies to mitigate possible attacks.
Instead of reading an unread request body, set the response's Connection header to close and do not reuse the connection. This will probably not solve the problem with HttpClient, but it could be a sensible thing to do anyway. The problem here is, the response headers at that time will be already sent in 99.9% of cases. (Not in EmbedIO 4.0 though.)
...any more, anyone?

Also, is anyone willing to research whether this HttpClient behavior is a bug (so we're discussing the gender of angels, as the Italian saying goes) or is dictated by some RFC? Unfortunately I'm very tight on time these days.

geoperez commented 2 years ago

LOL, why are Italians discussing that? Anyway, number two could be a better approach. A mix between reading if the length is small (set a custom threshold) and closing the connection if the length is shady.

rdeago commented 2 years ago

So, number 2 with number 3 as a fallback. Looks good. 👍

\<OffTopic>

LOL, why are Italians discussing that?

🤣🤣🤣 @geoperez ¡gracias por alegrarme el día!

To "discuss the gender of angels" in Italian means to discuss an idle, useless topic, wasting time that could be better spent.
(Italian Wikipedia, translated by Google)
Italians tend to do it a lot when commenting soccer matches. 😁

\</OffTopic>

On second thought, however, there are no angels involved here. EmbedIO currently has no protection against the type of attacks mentioned by @radioegor146. A malicious client may send a 4Gb POST body to an EmbedIO server running on a Raspberry Pi, and we'll happily try to read it all and probably deserialize it too! 😱💥💥💥🙈

Since this is not the same problem reported by @bdurrer (although clearly related), should I open another issue @geoperez?

radioegor146 commented 2 years ago

I'm staying with the second option. I'm sure, that if there is an HTTP request, you can obviously get the size of the data (by reading it at least partially). And for chunked requests, can't we just read chunk size, skip chunk bytes, and so on? Like we can read only some parts of data that provide length.

geoperez commented 2 years ago

@rdeago probably yes.

unosquare / embedio

Post Request Body must be read #558