tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
614 stars 103 forks source link

TDBv0.2: Cache background revalidation and eviction #515

Open krizhanovsky opened 8 years ago

krizhanovsky commented 8 years ago

Depends on https://github.com/tempesta-tech/tempesta/issues/1869

Scope

tfw_cache_mgr thread must traverse Web-cache and evict stale records on memory pressure or revalidate them otherwise. The thread must be accurately scheduled and throttled to not to impact system performance as well as efficiently free required memory. #500 must be kept in mind as well.

Validation logic is defined by RFC 7234 4.3 and requires implementation of conditional requests.

Keep in mind DoS attack from #520. Following items linked with #516 (TDB v0.3) must be implemented:

The task is required to fix #803.

UPD. Since filtering (#731) and QoS (#488) also require eviction, there job should be done in tdb_mgr thread instead.

UPD. TDB was designed to provide access to stored data in zero-copy fashion, such that cached response body can be sent directly to a socket. This property made several design limitations and introduced many difficulties. However, with TLS we always have to copy data. So TDB design can be significantly simplified with copying. So depends on #634.

Cache eviction

While CART is well known good adaptive replacement algorithm, there are number of caching algorithms based on machine learning, which provide much better cache hit. See for example the survey and Cacheus. Some of the algorithms required access to columnar storage for statistics (common practice in CDNs).

At least some interface for the user space algorithm is required. Probably just CART with some weights, where weights are loaded from the users space into the kernel, would be enough.

The cache must implement per-vhost eviction strategies and space quotas to provide caching QoS for CDN cases. Probably 2-layer quotas are required to not prevent poor configuration issues for bad Vary specification on application side, which may take too much space (linked with #733). Different eviction strategies are required to handle e.g. chunks of live streams (huge data volume, immediately remove outdated chunks) and rarely updated web content like CSS (may service stale entries).

It must be possible to 'lock' some records in evictable data sets (see #858 and #471).

Purging

On this feature implementation we should be able to normally update the site content w/o Tempesta restart or memory leaks. It's hard to track which new pages appeared and which are deleted during site content update, so in this task we need:

  1. full web content purging;
  2. regular expression purging, e.g. /foo/*.php or /foo/bar/*
  3. ~immediate (purge in original #501) strategy for the purging (we still need the mode to leave stale responses in the cache for #522);~ Done in #2074

Documentation

Need to update https://github.com/tempesta-tech/tempesta/wiki/Caching-Responses#manual-cache-purging wiki page.

Testing

krizhanovsky commented 3 years ago

It seems there is some race in the lock-free index or we actually hit the https://github.com/tempesta-tech/tempesta/issues/500 problem in scenario from #1435 : multiple parallel requests to large file

./wrk -d 3600 -c 16000 -t 8 -H 'connection: close' https://debian:443/research/web_acceleration_mechanics.pdf

combined with the Tempesta restart in the VM

# while :; do ./scripts/tempesta.sh --restart; sleep 30; done

sometimes produce warnings like

[ 1103.775556] [tdb] ERROR: out of free space
[ 1103.810415] [tdb] ERROR: out of free space
[ 1103.845177] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1103.929897] [tdb] ERROR: out of free space
[ 1103.949002] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1103.984315] [tdb] ERROR: out of free space
[ 1104.010543] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1104.070816] [tdb] ERROR: out of free space
[ 1104.080997] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1104.151540] [tdb] ERROR: out of free space
[ 1104.158845] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
[ 1104.199489] [tdb] ERROR: out of free space
[ 1104.231891] [tdb] ERROR: Cannot allocate cache entry for key=0x37bed983985f3ea7
....
krizhanovsky commented 2 years ago

The task must be split. After #788 the most crucial part is removing cache entries for #522 and some basic eviction to get the cache usable, i.e. get rid of the memory leaking.

const-t commented 1 year ago

I've made few roughly benchmarks HTTP2 with enabled caching.

h2load -c700 -m100 --duration=30 -t2 https://debian

Tempesta

1kb response

finished in 30.14s, 337279.80 req/s, 393.06MB/s
requests: 10118394 total, 10188394 started, 10118394 done, 10118394 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10118394 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 11.52GB (12364696856) total, 1.70GB (1821310920) headers (space savings 23.08%), 9.65GB (10361235456) data
                     min         max         mean         sd        +/- sd
time for request:      391us    404.11ms     69.33ms     52.31ms    64.69%
time for connect:    70.24ms    229.04ms    169.16ms     56.50ms    61.71%
time to 1st byte:   195.61ms    323.51ms    252.20ms     27.06ms    79.96%
req/s           :       0.00     4462.36      803.41      771.99    59.29%

5kb response

finished in 30.23s, 229514.40 req/s, 1.14GB/s
requests: 6885532 total, 6955433 started, 6885532 done, 6885432 succeeded, 100 failed, 100 errored, 0 timeout
status codes: 6885469 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 34.16GB (36684160200) total, 1.16GB (1244661614) headers (space savings 23.00%), 32.83GB (35253572326) data
                     min         max         mean         sd        +/- sd
time for request:    17.12ms    698.47ms    103.21ms     39.88ms    90.88%
time for connect:    73.25ms    237.29ms    165.14ms     56.21ms    69.57%
time to 1st byte:   210.69ms    299.74ms    253.76ms     25.23ms    58.53%
req/s           :       0.00      603.40      366.27      247.73    69.86%

128kb response

finished in 30.36s, 17200.80 req/s, 2.11GB/s
requests: 516024 total, 586024 started, 516024 done, 516024 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 516273 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 63.24GB (67904607399) total, 90.50MB (94901121) headers (space savings 22.71%), 63.01GB (67651755146) data
                     min         max         mean         sd        +/- sd
time for request:    47.50ms      18.31s    998.10ms       1.12s    95.44%
time for connect:    70.58ms    254.74ms    159.74ms     56.57ms    68.43%
time to 1st byte:   203.41ms    474.57ms    360.97ms     78.33ms    58.21%
req/s           :       0.00      181.65       31.60       47.24    77.14%

128kb reponse with HTTP/1

finished in 30.37s, 21665.00 req/s, 2.65GB/s
requests: 649950 total, 719750 started, 649950 done, 649950 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 650181 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 79.52GB (85388074799) total, 142.43MB (149350032) headers (space savings 0.00%), 79.95GB (85844417328) data
                     min         max         mean         sd        +/- sd
time for request:    27.77ms       2.64s    510.89ms    293.07ms    85.45%
time for connect:    76.97ms    210.16ms    152.70ms     47.93ms    69.34%
time to 1st byte:   187.62ms    302.22ms    253.48ms     39.83ms    54.62%
req/s           :       0.00      336.64       48.35       78.75    82.86%

Nginx (nginx/1.23.3)

1kb response

finished in 30.15s, 135510.73 req/s, 150.56MB/s
requests: 4065322 total, 4135322 started, 4065322 done, 4065322 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 4065322 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 4.41GB (4736134430) total, 476.87MB (500034606) headers (space savings 33.15%), 3.88GB (4162889728) data
                     min         max         mean         sd        +/- sd
time for request:     1.45ms       1.54s    530.87ms    307.86ms    70.73%
time for connect:    15.54ms    374.44ms    123.50ms     85.68ms    77.57%
time to 1st byte:   179.61ms    909.80ms    359.37ms    165.22ms    86.00%
req/s           :     109.97      366.27      193.44       80.16    71.71%

5kb response

finished in 30.16s, 168594.90 req/s, 846.10MB/s
requests: 5057847 total, 5127847 started, 5057847 done, 5057847 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 5065270 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 24.79GB (26616104602) total, 599.00MB (628093480) headers (space savings 32.97%), 24.12GB (25896832020) data
                     min         max         mean         sd        +/- sd
time for request:      359us       5.39s    432.35ms    460.44ms    87.07%
time for connect:    22.18ms    265.32ms    123.70ms     63.49ms    57.29%
time to 1st byte:   219.39ms       2.17s    803.55ms    511.62ms    59.57%
req/s           :      55.85      558.71      240.58      163.94    72.29%

128kb response

finished in 30.27s, 16222.27 req/s, 2.05GB/s
requests: 486668 total, 556668 started, 486668 done, 486668 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 548023 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 61.56GB (66099265904) total, 65.85MB (69050898) headers (space savings 32.98%), 61.42GB (65952787645) data
                     min         max         mean         sd        +/- sd
time for request:    21.49ms      29.62s       3.73s       3.07s    71.63%
time for connect:    23.21ms    310.06ms    147.42ms     71.60ms    57.86%
time to 1st byte:   247.08ms       1.68s    754.43ms    418.40ms    52.57%
req/s           :       3.10      175.05       23.13       21.80    88.00%

FYI: Sometimes h2load freezes at the end of benchmarking tempesta. Looks like tempesta holds connection.

krizhanovsky commented 2 months ago

With the latest discussion https://github.com/tempesta-tech/tempesta-test/pull/602/files#r1622305438 and our website purging issue https://github.com/tempesta-tech/tempesta-tech.com/issues/64 , it could make sense to make the eviction thread also send conditional requests for particular resources (typically defines as dynamic, e.g. wiki or blog posts in our case).

This causes extra overhead to both the upstream and Tempesta servers and introduces delays. It's much worse than cache purge plugins, but it would solve our problem and maybe similar problems of others. TBD: it solves the problem not in a nice way and requires development effort...