tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
617 stars 103 forks source link

Kernel-User Space Transport #77

Open krizhanovsky opened 9 years ago

krizhanovsky commented 9 years ago

Motivation and architecture

We need to export some logic to user-space and/or third-party servers. User-space tasks must be done asynchronously to softirq processing, just like NF_QUEUE for the netfilter. Examples are:

FastCGI, uwsgi and ICAP implement their own protocols, different from HTTP. All the logic above should not be considered as core or mission critical.

Thus, we should be able to pass some HTTP requests to user space for complex processing and get appropriate responses from user space. Configuration option (like HTTP scheduler using http_match):

    match user_space_offload uri prefix "/rest/";

should be used to pass a client request to user space processing daemon.

Zero-copy transport of HTTP messages between kernel and user spaces must be used. It should be done based on mmap() interface for parsed HTTP messages. The proposed scenario for processing a ingress HTTP request and sending a generated HTTP response is illustrated by the figure at the below.

user_kernel_comm

  1. Softirq handler receives packets that hold an HTTP request. The Linux TCP/IP stack is patched so that the packet’s payload is always placed in memory pages, which can be mmap()’ed.
  2. The request is parsed and all required data, including the parsing meta-information and the packet’s data, are placed in several memory pages. HTTP messages are processed in a zero-copy fashion, i.e. HTTP fields are not copied. Instead, appropriate pointers are stored in the parsing meta-information which point into the received packet data, like the start of HTTP header field name and value.
  3. When memory pages of HTTP request are mapped to the user space process’ address space, the softirq handler wakes up the process.
  4. Now the process can run heavy logic on the mmap()’ed request. An example of heavy logic could could be data compression.
  5. The advanced classification process can generate a response to it (e.g. with HTTP error code). The same memory mapped region is used to pass the HTTP response to the kernel.
  6. Finally, softirq handler can send the response to the client.

GFSM should be used to redirect HTTP messages satisfying user_space_offload rule to user space and wait for responses (e.g. modified HTTP requests for further redirection in ICAP case or just a response for RESTful API case).

A user-space logic may produce larger HTTP message than an original, e.g. add an HTTP header. We can do this with allocation a page fragment (also in user space) and pass it to kernel with the frament offset to let the kernel properly arrange skb fragments.

Since a user-space application may run in a virtual container, the mapping transport must be container-aware and provide a configuration which HTTP messages map to which containers, probably based on current vhost and location basis.

API

A C API must be provided to bind with various programming languages like C, C++, Rust or Python.

Probably io_uring should be used for the API, also see the generic ring buffer API proposal for the Linux kernel.

Asynchronous processing

Having event-driven software, e.g. Nginx, a modern HTTP servers can process thousands requests concurrently on modern multi-core machines. However, there are still heavy computational tasks, leading to high response times on large percentilies, e.g. data compression or some security checks, e.g. parsing and analyzing a DOM tree for a large HTTP response. These tasks are performed on CPU and can not be offloaded to a co-processor leaving CPU processing other HTTP messages. While some tasks can be offloaded to GPU, e.g. TLS handshakes, some tasks work with large memory volumes in stream mode, e.g. HTTP POST processing, so it doesn't make sense to offload them to GPU. Thus, if a server has N CPU cores and gets N HTTP request with expensive CPU computations, it can not process other light-weight requests.

This task, offloading some HTTP processing to user-space, solves the problem with synchronous processing: now we can offload expensive CPU computations to a user-space where they'll be processed with lower scheduler priority while softirq can continue to work with other HTTP requests. GFSM is useful here to store an HTTP message processing context for user-space processing.

Synchronous processing

Some logic (security applications) require to make a decision (pass/block) or mangle a traffic synchronously, to not to pass malicious traffic to a protected backend server. This processing type can be done in the same user-space process as the asynchronous one, probably using the GFSM or some synchronization mechanism in a shared memory.

Dynamic programs

The API must allow to register (attach) new synchronous and asynchronous user-space programs in run-time, without Tempesta FW restart (just like BPF scripts).

Serverless

If we map all the pages with HTTP messages as read-only for the user space and use separate memory area for writing, then this can be an alternative for the modern serverless architecture - an unpriviledged user may read their traffic and run some logic in a separate address space.

Failovering

A user space HTTP message handling program can work as a Linux process, Docker or LXC container. If the program crash in a container, then the container infrastructure is responsible to restart the process. However, for the case of Linux process Tempesta FW must take care for restarting the process.

This behaviour is inspired by Erlang OTP and will make C/C++ web applications more reliable: in worst case a user will have CGI-like application which spawns a new process for each request, but in normal case we'll have a true application server without neither the risk for the whole server crash nor extra cost on FastCGI.

References

An example of a similar solution for the Linux zero-copy read via io_uring is in Fast ZC Rx Data Plane using io uring talk.

ai-tmpst commented 3 weeks ago

In the scope of #537 is developing a ring-buffer mapped to userspace. It could be useful in this task. Look at fw/ringbuffer.*.