tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
614 stars 103 forks source link

BUG at /root/tempesta/tempesta_fw/str.c:340 #320

Closed krizhanovsky closed 8 years ago

krizhanovsky commented 8 years ago

kernel:Kernel panic - not syncing: Fatal exception in interrupt Oct 20 07:07:03 vip-1 kernel: kernel BUG at /root/tempesta/tempesta_fw/str.c:340! Oct 20 07:07:03 vip-1 kernel: RIP [] tfw_str_eq_cstr+0xf6/0x100 [tempesta_fw] Oct 20 07:07:03 vip-1 kernel: ---[ end trace d6e1045fde362742 ]--- Oct 20 07:07:03 vip-1 kernel: Kernel panic - not syncing: Fatal exception in interrupt Oct 20 07:07:03 vip-1 kernel: drm_kms_helper: panic occurred, switching back to text console Oct 20 07:07:03 vip-1 kernel: ------------[ cut here ]------------ Oct 20 07:07:03 vip-1 kernel: WARNING: at kernel/rcutree.c:388 rcu_eqs_enter+0x8b/0xa0() Oct 20 07:07:03 vip-1 kernel: ---[ end trace d6e1045fde362743 ]--- Oct 20 07:07:03 vip-1 kernel: ------------[ cut here ]------------ Oct 20 07:07:03 vip-1 kernel: WARNING: at kernel/rcutree.c:528 rcu_eqs_exit+0x89/0xa0() Oct 20 07:07:03 vip-1 kernel: ---[ end trace d6e1045fde362744 ]--- [10:27:09 PM] zuer rong: system hung .

keshonok commented 8 years ago

This issue may be connected with #328, as the bug occurs in tfw_str_eq_cstr() on a test of whether a TfwStr{} string is DUPLICATE. It must not be a duplicate when this function is called.

I am unable to reproduce this bug so far. A possible culprit has been found and pushed to master in 356f620.

krizhanovsky commented 8 years ago

My investigations:

Since two different functions, tfw_http_msg_field_chunk_fixup() (#328) and tfw_str_eq_cstr() (this issues), fail at the same assertion while there were not crashes on other TfwStr members (like len, skb or ptr) it seems that the bug affects flags only... Moreover, #328 and this issue happen at different time and tests, so probability of improper build is actually small. KASAN also didn't find any issues.

One more interesting thing is that workload for the crash was pretty simple: just ab or wrk benchmarking. These tools send very simple requests without duplicate or long fields. The requests are:

     GET / HTTP/1.0
     Host: 172.16.0.5
     User-Agent: ApacheBench/2.3
     Accept: */*

     GET / HTTP/1.1^M
     Host: 172.16.0.5^M

for ab and wrk correspondingly.

Testing configuration (as supposed used in original tests):

server 127.0.0.1:8080;
cache 0;
sticky name=__cookie__ enforce;

frang_limits {
    request_rate 100000;
    request_burst 100000;
    connection_rate 20000;
    concurrent_connections 20000;
    client_header_timeout 20;
    client_body_timeout 10;
    http_uri_len 1024;
    http_field_len 256;
    http_ct_required false;
    http_methods get post head;
    http_ct_vals "text/plain" "text/html";
    http_header_chunk_cnt 10;
    http_body_chunk_cnt 0;
    http_header_cnt 16;
}

Now I add couple of checks to harden manipulations with chunks number in TfwStr.flags and leave the issue open for further investigation.

krizhanovsky commented 8 years ago

The issue seems a memory corruption. We fixed many of memory corruptions since that time and didn't see the issue anymore.