tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
613 stars 103 forks source link

Multi-pattern regular expressions #496

Open krizhanovsky opened 8 years ago

krizhanovsky commented 8 years ago

Tempesta FW core must implement multi-pattern regular expressions to efficiently handle HTTP matching rules for filtering and configuration (see for example #471, #495, #530, #1544 with many ignored headers matching in #1550 for caching). Intel HyperScan can be used as reference or foundation for the feature.

ReDoS must be considered by the implementation. It seems limited or fully prohibited back and forward referencing and resource consumption in sense of #488 .

Should be done close or together with #732, since simple multi-pattern is a sub-task of multi-pattern regexps.

Since Tempesta FW deals with fields of parsed HTTP messages, in general we need (1) relatively simple regular expressions for (2) relatively short strings. E.g.

location ~ ^/(/category/foo/|dddd|ccccc|vvvv|aaaa)/
hdr "Referer" == "*.tempesta-tech.com/*"  -> base;

In most cases simple multipattern prefix/suffix is enough. Definitely no need for PCRE. However, there could be tens of location rules with simple regexps, so multi-pattern regexps still make sense.

The only functionality requiring relatively large input data (up to tens kilobytes and hundreds bytes in average) and complex regexps is WAF filtering rules against User-Agent, URI, Cookie or other headers values.

These two cases must be separated:

  1. a simple multi-patter string search (e.g. Comentz-Walter or a SIMD algorithm) with begind/end bindings
  2. multipattern regexps, e.g. with runtime ported from Hyperscan (done in https://github.com/G-Core/linux-regex-module)
krizhanovsky commented 4 months ago

Let's just integrate with hyperscan for now - hyperscan should be good for simple patterns.

Also need to extend the tests for HTTPtables to use the regular expressions.

krizhanovsky commented 2 months ago

I forked the repo https://github.com/tempesta-tech/linux-regex-module . The discussed TODO is

  1. make the repo work with the current 5.10 or next 6.8
  2. adjust tempesta.sh to laod the module
  3. work with @RomanBelozerov on TODO (new) issues for packaging and CI
  4. adjust locations and httptables code to work with regexes
biathlon3 commented 1 month ago

Description of the installation process on Wiki