Open krizhanovsky opened 10 years ago
x20
) in URI, then a request can bypass the validator using simple hex encoding GET /foo%20bar HTTP/1.1
. One of dangerous real life example could be a response splitting attack:/redir_lang.jsp?lang=foobar%0d%0aContent-Length:%200%0d%0a%0d%0a
HTTP/1.1%20200%20OK%0d%0aContent-Type:%20text/html%0d%0a
Content-Length:%2019%0d%0a%0d%0a<html>Shazam</html>
Allowed characters (bytes) must be taken from the same configuration options as for #628.
The encodings must be validated, see for example validate_url_encoding()
from ModSecurity/apache2/re_operators.c
.
Traffic normalization for intrusion detection is well studied, see for example Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics for L3-L4 NIDS.
Huffman decoder and encoders should be reviewed: at the moment we use 1-character decoding table which shows better performance than for nghttp2 and Nginx decoders https://github.com/tempesta-tech/tempesta/pull/1176#discussion_r257840173 . However, LiteSpeed uses large tables and batching to speedup Huffman encoding and decoding. Probably allowed characters (in sense of #628), already decoded (in sense of this issue) can be encoded in the large table. Also see #1207.
To not to hurt performance in cases which don't require strong security, the feature should be configurable per-vhost and per-location in the same sense as #688.
The transformation logic (as described in RFC 7230 5.7.2) for cookies and URI must be done by a configuration option (see also #902 ):
http_norm <uri|cookie>
content_security_mode <strict|transform|log>
, e.g.
http_norm uri cookie;
content_security_mode strict;
Following checks and transformations must be done:
url
- decode percent-encoded string (double percent hex and first/second/double nibble hex aren't allowed). Messages with wrong hex strigs (e.g. http://foo.com/?%product=select 1,2,3
, %
isn't followed by 2 hex digets) must be blocked. Spaces may be represented in many ways, e.g. with +
or %20
(see HTML URL Encoding Reference) - we don't need to do anything with it. RFC 3986 allows percent encoding in all parts of URI, but it's unclear how to deal e.g. with UTF-8 hostname, so we decode URI abs_path only.utf8
- validate UTF-8 encoding: decode percent-encoded and validate UTF-8 bytes;path
- remove path traversals like /../
or //
(see ngx_http_parse_complex_uri()
) and translate \
to /
.pollution
(subject for #1276) - take the 1st polluted HTTP parameter for URI or POST in content_security_mode=transform
mode. In validation mode (w/o content_security_mode=log
attribute) just builds a map of the parameters and ensures that there is no pollution. In content_security_mode=transform
mode rewrites the URI (available for URIs only) and drops a request and writes a warning for content_security_mode=strict
. HTTP parameter fragmentation, e.g.
http://test.com/url?a=1+select&b=1+from&c=base
is left for application-side WAF.
Additional alphabets must be introduced to validate the strings after all the decodings. These alphabets may prevent double percent encodings (e.g. %2541
which is essencialy %41
after the first hex decoding and A
after the second) by prohibiting %
.
path
must be executed after string decoding, e.g. path /a/b/abba/%2e%2e/abba
must be decoded and after that ..
removed. Also allowed alphabets must be verified after the decodings to block messages with CR, LF or zero byte.
If none of the normalization option is specified, then the HTTP parser must not perform detailed processing and just validate allowed alphabet as now, i.e. there must be zero performance overhead if normalization isn't required by configuration.
All the decoders and log
and strict
modes must copy an observed string to some buffer, because we need to forward percent-encoded URI. Since all the encodings are larger than an original data, the content_security_mode=transform
mode must percent-recode decoded string in-place rewriting the original string. skb fragmentation should be used to handle data gap between shortened URI and HTTP/
part. The fragmentation must be done only once when all the decoders finish. A fallback to full data copying if number of fragments per buffer (#498) grows to more than a compile-time constant.
The normalization must be done before the cache processing to minimize different URI keys stored in the cache.
Since it's unwished to grow current HTTP parser states set, the logic must be done in the plugable (external) FSM by conditional unlikely jump (no need to support the compilation directive any more).
Also please fix the TODO for URI abs_path for more accurate filtering of injection attacks, e.g. it'd be good to be able to prohibit /
in query string.
There are SIMD implementations of UTF-8 validation or recoding (e.g. to/from UTF-16 or UTF-32). See for example
However, probably it makes sense to sacrifice SIMD to do percent-decoding, UTF-8 validation, validation of allowed character sets (in sense of #628 ) and transformations (path or arguments) in single pass.
Please update https://github.com/tempesta-tech/tempesta/wiki/Web-security Wiki page on finishing the task.
GET /vulnerabilities/xss_d/?default=/%0aSet-Cookie:crlf-injection HTTP/2
We leave back-end server personality normalization for further development if there will be any real requests. Probably this won't be needed since we're going to provide full Web server functionality and leave really heavy processing logic to dedicated WAF solutions.
HTTP responses also aren't normalized - we target initial attacks filtering instead of filtration of their consequences.
Also the decoders set is very restricted, e.g. there is no lower case conversion or Microsoft %U decoding or unicode normalization, so please keep in mind possible further extension of the decoder.
The issue was wrongly closed
We need at least minimal HTTP requests normalization (like ngx_http_parse_complex_uri() from Nginx does it). The normalization must be implemented as a part of current HTTP FSM (to avoid double processing as in Nginx) and write normalized fields to appropriate part of TfwHttpReq.
Normalization depending on back-end server personality also must be done, however this is very customizable and expensive logic, so it should be possible to switch the functionality off. Thus it must be implemented in http_norm.h as plugable HTTP FSM states.
Basically, we shouldn't do the normalization if
Cache-Control: no-transform
is presented.Depends on #902. Linked with #1207.