tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
609 stars 104 forks source link

Custom logging #537

Open krizhanovsky opened 8 years ago

krizhanovsky commented 8 years ago

Now all information from config file parsing errors to clients blocking is written to dmesg. Instead following logs on top of TempestaDB must be introduced:

The 2 modes of logging must be implemented:

  1. TDB tables with automatic eviction of old records (ring buffer) to fit in RAM
  2. logs transmission to a remote server using TCP synchronous sockets (not to lose records under peak load)

The logs should be configured by independent configuration options:

    log_access /opt/tempesta/db/log_access.tdb <variables>
    log_error /opt/tempesta/db/log_error.tdb <variables>
    log_security 192.168.1.1:5000 <variables>

variables is the list of variables to log:

  1. remote_addr - remote user IP
  2. time - local time (miliseconds since epoch)
  3. method - request method
  4. uri
  5. resp_status - response status
  6. body_sent - response bytes sent
  7. resp_time - response time
  8. values of any special header with srvhdr_ or clntcdr_ prefixes and - changed to _, e.g. srvhdr_set_cookie or clnt_user_agent
  9. cache miss/hit
  10. origin IP

In general, Tempesta DB should provide streaming data processing (#515 and #516) foundation for the logging application. Probably we need to keep all request and response headers with the metadata (e.g. timings, chunks information, TCP/IP and TLS layers etc) for a relatively short sliding window. Such data is extremely useful to query to debug immediate performance issues and DDoS attacks.

TBD a possible application is to batch events in front of a time series DB, e.g. ClickHouse, InfluxDB or TimescaleDB.

We need to support per-vhost logging, i.e. thousands of vhosts having several logs each. This could be done using either secondary index #733 or we have to be able to scale to thousands of TDB tables.

For better performance logs must use simple sequential ring-buffer TDB table format w/o any indexes (#516). Log records must be stored in structured TDB records. Probably we don't event need TDB for this and just mmap the ring buffer into the user space.

The binary log format could be

    <event_type><timestamp><var0><var1>...<varN>

, where event_type defines the event type and it's format (number of variables and their type, e.g. client address, URI, HTTP Host header etc.). In this release the format must be simple and hardcoded.

Simple retrieval user space tool like varnishlog must be developed on top of tdbq to print the logs and/or write them in human-readable or JSON formats to files. The tool also must be able to run in daemon mode, read the TDB tables and flush the log records to files or syslogd.

The human-readable text format should be compatible with the W3C draft, but should also provide more information.

Also reference TUX and HAProxy, which also use(ed) binary logging.

krizhanovsky commented 6 months ago

At the moment there are hundreds of log messages of various levels and generic printf()-like formats. Hopefully, all of them are printed with macros like T_ERR or T_WARN, so all of them can be preprocessed by a tool, which will build a C table with indexes and compiled formats to avoid formats conversion in runtime. See qrintf

One more approach is to log only binary data (e.g. integers, floats and nanoseconds since epoch) into a ring buffer and use a log reading tool to process the binary data. This is very close to what HAproxy does https://www.youtube.com/watch?v=762owEyCI4o

Access log is a separate case: it can be used to compute an advanced statistics - larger log allows longer statistics. E.g. with the current access log we can compute statistics for each of return code and we don't actually need the counters implemented in https://github.com/tempesta-tech/tempesta/pull/2023 (#1454). Access log can be also extended with:

Probably #2023 should be reverted, but maybe we should provide an application layer parser, which will compute the statistics. This probably can be done with the same library as tdbq, see #279 and #528.