Open krizhanovsky opened 9 years ago
HTTPtables implement 2 logical operators:
tfw_http_tbl_scan()
spins in a loop, while there is another chain for a current action.This is just a one example of the overhead introduced by implementing simple operations execution by a C data structures and loops.
These operators can be implemented in a more efficient way - just compile the rules into binary code and exit from the function, when we're done. This is what eBPF actually does. With the new implementation we need to implement basic operations, e.g. strings matching and functions for the actions and glue this all with eBPF.
The language should use approach similar to bpftrace: compile a C-like language with some Python-like syntax sugar into a C program compiled into BPF.
The language scripts must be able to operate with general purpose Tempesta DB hash tables. We should not use the BPF hashes because we need to be able to operate with them also using tdbq
and/or REST API (e.g. see an HAproxy bots protection examples of using stick tables).
HTTPtables develops functionality. There is no need to work with TCP and IP layer, since we have integration with the Netfilter, which already does this perfectly.
Once the HTTPtables architecture is reworked for better performance, we can close the issue.
i'm not sure how fast tempesta is (hvnt tested) but
if u really need wasm, u shld still hv an af_xdp adoption layer too, and make it rust i guess.
u shld look at bytedance/monoio and be in line with their implementation compatibility if possible coz they try to go for the ultimate performance for web server.
everything i've mentioned shld take quic into future consideration and all of which i have mentioned does in their own way. u can also look at pantuza equic github for "future" compatibility issues etc.
everything mentioned is vague but what i am emphasizing for tempesta to move forward faster is...
wider adoption through compatibility with existing tool chains (or at least more compatible so modification is lesser)
better standardization to leverage on the development of future toolchain.
doing so can move tempesta dev faster and easier migration to quic etc in future.
This relates to Tempesta xFW, the enterprise volumetric DDoS mitigation module.
Tempesta Language is a DSL for L4-L7 network data processing. While L3 data is visible for TL programs, it's not assumed to work on L3 due to higher overheads in comparison with eBPF and nftables. TL programs run in softirq context, so can not sleep and block.
Must be implemented JIT language for dynamic filtering and classification rules, traffic transformation and whatever anyone wants. The language must have abilities to implement Frang, sticky cookies, load balancing and few other current features in more robust way.
Consider following extract from access.log for a real world DDoS attack:
In this case following rules might be helpful:
There are couple of examples of the assumed implementation to get sense of the language. SSL Heartbleed can be filtered out by following expression:
The problem with IPTables and eBPF is that the tools work with separate skbs, so their rules are easy to be eluded by splitting TCP segments into multiple IP packets. Thus, TL must work on TCP stream and higher layers. Internally packets bounds must be processed by storing current matching state of a FSM, i.e. we need a Turing complete language while eBPF isn't such kind of a language. Also eBPF uses restricted instruction set and its programs are not more than 4K instructions. The restictions aren't good. Thus, the better way is SystemTap's like: compile TL into C kernel module and run it with user-space interfaces (like eBPF maps) using Kernel-User Space Transport.
Another example from CloudFlare is
The rule matches only User-Agent value from the end instead of scanning the whole packet as IPtables string module or eBPF do.
While current Frang rule set doubles IPtables functionality (e.g. connections limiting), we still need to account such low level information to be able to specify complex multi-layer rules, e.g. "block a client with more than 10 connections for the last minute and without User-Agent header".
Basically TL must provide very close filtering abilities to Suricata. However, while Tempesta FW is a TCP end point, then it isn't vulnerable by IDS evasion techniques. Also the overall system processes HTTP only once (instead of processing it at the IDS and a Web accelerator) and there is no need to place SSL terminator before the IDS introducing multiple data copyings and context switches.
The engine must also provide rewrite logic, at least for HTTP headers, but also for arbitrary HTTP message part (e.g. to implement SSI/ESI extensions).
Since relatively complex logic is expected to be implemented using TL, TL programs must be implemented as GFSM subroutines, explicitly or implicitly usng high level alnguage constructions like
yield
operator.Also consider WASM extensions, see Envoy as an example. There is an ABI specification.