krizhanovsky commented 6 years ago

The POST body processing option must be configured per-location and per-vhost by a new configuration option

http_post_validate=N

Where N is maximum content length, 0 by default means process all POST requests w/o any upper limit for the length. By default, if the option is missed in the configuration file, we should not perform any POST validation.

We have to process POST body, at least boundary. Empty and doubling boundary must be correctly handled e.g.:

POST /vuln.php HTTP/1.1
Host: test.com
Connection: close
Content-Type: multipart/form-data; boundary=
Content-Length: 192

--
Content-Disposition: form-data; name="id"

123' or sleep(20)# 
----

POST /vuln.php HTTP/1.1 
Host: test.com
Content-Type: multipart/form-data; boundary=FIRST;
Content-Type: multipart/form-data; boundary=SECOND;
Content-Type: multipart/form-data; boundary=THIRD;

--THIRD 
Content-Disposition: form-data; name=param 

UNION SELECT version()
--THIRD--

POST /vuln.php HTTP/1.1 
Host: test.com
Content-Type: multipart/form-data; xxxboundaryxxx=FIRST; boundary=SECOND;

--FIRST 
Content-Disposition: form-data; name=param 

UNION SELECT version()
--FIRST--

POST /vuln.php HTTP/1.1 
Host: test.com
Content-Type: multipart/form-data; boundary=FOO;

--FOO
Content-Disposition: form-data; name=param1

--FOO
Content-Disposition: form-data; name=param2

UNION SELECT version()
--FOO--

See more cases in https://www.slideshare.net/ssusera0a306/zeronights-2016-a-blow-under-the-belt-how-to-avoid-wafipsdlp-wafipsdlp and https://blog.qualys.com/wp-content/uploads/2012/07/Protocol-Level%20Evasion%20of%20Web%20Application%20Firewalls%20v1.1%20(18%20July%202012).pdf

A new configuration option must be introduced

content_security_mode <strict|transform|log>

The configuration option influences all RFC-undefined HTTP content mutation with probably security issues. strict means just drop a request, i.e. if RFC doesn't allow some parameter to be doubling, then just drop the request and write a warning to log. transform takes the first occurrence and ignores all the following, and write a warning log message. log just write a warning message, as both other modes, and leaves everything as is. Very important list all the cases affected by the option in Wiki, that's very crucial for debugging probably issues with web application. Also add description of the attacks to the Web security wiki with examples and use cases.

Tempesta must not allow multiple same-name parameters in a Content-Disposition part header, doubling.

POSTs can be pretty large and have many parameters. So need a good string search algorithm, BM or an AVX2 matcher. Need to test the algorithm performance on different part sizes (it must be fast for small parts as well). The matching strings can be chunked on network or HTTP transfer encoding layers, e.g.

POST / HTTP/1.1
Host: destinationBucket.s3.amazonaws.com
Content-Type: multipart/form-data; boundary=9431149156168
Transfer-Encoding: chunked

70\r\n
--9431149156168
Content-Disposition: form-data; name="key"

acl
--943
161\r\n
1149156168
Content-Disposition: form-data; name="tagging"

<Tagging><TagSet><Tag><Key>Tag Name</Key><Value>Tag Value</Value></Tag></TagSet></Tagging>
--9431149156168--

There are 2 chunks of size 70 and 161 and the chunks boundary is at the multipart boundary identifier. The search algorithm must store current state on a chunk boundary and continue on the next chunk.

Please check standards and implementations for boundary usage. Apparently it's only for Content-Type: multipart/form-data - web forms which aren't so large and also have many attack vectors. So we should not go into a situation when we're scanning a large blob (e.g. a DVD image) for boundary. In normal case of course. Passing wrong content is addressed by #1119.

Required for #2. Both the issues care only about strict (just drop bad requests) and transform (rewrite a request using the first occurrence to save resources) modes. We do not care about particular end-point application personalities, e.g. differences between parsing by ASP or PHP.

Functional test is described in #843 - please implement the appropriate checkbox running all the relevant POST attacks from both the links above.

Testing

The appropriate test issue for the feature https://github.com/tempesta-tech/tempesta-test/issues/108

krizhanovsky commented 6 years ago

While #628 implements alphabet checking, for now we do not parse POST body and just skip it, so this issue must use alphabet checking from #628 and implement the POST validation.

Please update Custom character sets wiki page for the new POST processing functionality.

i-rinat commented 5 years ago

There is a evasion attack based on character escaping in field parameters. According to the RFC 7231, boundary parameter may be a "quoted-string". Something like "aaa\\bbb" should be decoded to aaa\bbb, but not all application servers do that. PHP, for example.

In the request below, PHP will see "Hello, PHP!", while application with a conformant parser will see "Hello, stranger!":

POST /endpoint HTTP/1.1
Host: loc
Content-Type: multipart/form-data; boundary="aaa\\bbb"
Content-Length: 177

--aaa\bbb
Content-Disposition: form-data; name="param"

Hello, stranger!
--aaa\bbb--
--aaa\\bbb
Content-Disposition: form-data; name="param"

Hello, PHP!
--aaa\\bbb--

krizhanovsky commented 5 years ago

Parts of the issue are implemented in pull requests #1139 and #1154. The short summary for TODO:

[ ] ~http_post_validate must be integer instead of current boolean - we need a limitation for maximum POST scanning.~ There is http_body_len configuration option, which limits the maximum POST body size. There is no sense to keep two size limits since this might lead to a security hole, when the tail of a POST isn't validated. Let's leave http_post_validate binary. Please write an example and allowed values range in etc/tempesta_fw.conf and description in the Wiki. Mention a linkage (preferably with a configuration example) how http_post_validate is linked with maximum POST size, so that we can limit maximum POST size and always analyze whole POST.
[ ] everything around content_security_mode (also keep in mind documentation requirements as above)
[x] ~all the sanitization logic.~ Let's skip the transform mode. At the moment we rewrite Content-Type and that's enough. Rewriting POST parameters body may cause invisible application errors and rises a lot of ambiguous replacements. Let's postpone this for at least one real user request.

dsvi commented 2 years ago

For multipart data encoding see RFC2046 For multipart form data see RFC1867

krizhanovsky commented 2 years ago

RFC 1867 is obsoleted by 2854, but it's about text/html and seems use MIME just in example. The real references for us must be RFC 2045, 2046 and maybe 2049.

Following implementations can be used as reference for multipart parser:

The second one deals with nested multipart bodies. I don't think that HTTP ever uses this. It'd be good to check some RFC for this or source code of some mature web project like Django, Node.js or PHP whether they parse nested parts (I checked Ngixn Unit and it seems it leave POST multipart processing for the application logic).

As noted above we should not transform the POST body, just parse the body and log warnings or block a request (in strict mode) it it has parameters pollution or broken structure.

With this task we only need to parse the POST multipart body and (1) validate it and (2) build an internal data structure to represent all the parameters. See for example AWS S3 POST structure.

The data structure representing the multipart body is TBD, but consider following options:

we can keep all the parts in a sorted array by e.g. first 8 bytes of the name parameter. The array can exponentially grow say from size 8. Also consider some hashing if it can be faster than binary search.
it seems there are only 3 possible headers: Content-Type, Content-Disposition and Content-Transfer-Encoding, which seems can just be parsed with current __parse_transfer_encoding() (BTW please update the comment for the function since RFC 2616 is obsolete now). Please use existing parsing functions to parse the headers
Since there are only 3 possible headers, we should keep then in header-id indexed array of size 3 (just as we do now for special headers) So we should have an array of pointers to data structures each with 3 headers and TfwStr for the body.

Dmitry-Gouriev commented 2 years ago

@krizhanovsky RFC 7578: sect. 4.3 and 5.2 explicitely allow multiple form fields with identical field names.

Dmitry-Gouriev commented 2 years ago

@krizhanovsky Also Content-Transfer-Encoding is deprecated (RFC 7578 sect 4.7)

tempesta-tech / tempesta

POST body processing #902

Testing