Closed krizhanovsky closed 8 years ago
This issue may be connected with #328, as the bug occurs in tfw_str_eq_cstr()
on a test of whether a TfwStr{}
string is DUPLICATE
. It must not be a duplicate when this function is called.
I am unable to reproduce this bug so far. A possible culprit has been found and pushed to master in 356f620.
My investigations:
TFW_STR_DUPLICATE
is set only in tfw_http_msg_hdr_close()
which is called for headers only and only after tfw_http_msg_hdr_chunk_fixup()
, so tfw_http_msg_field_chunk_fixup()
(cause of #328) can't face TFW_STR_DUPLICATE
;tfw_http_msg_hdr_close()
checks parser.hdr
for duplicates;TFW_STR_CHUNKN_SUB()
is also called only on getting header value, which happens after header parsing, so again tfw_http_msg_field_chunk_fixup()
can't face TFW_STR_DUPLICATE
;TfwHttpReq
with zeroing as well as zero h_tbl and parser.hdr
, so there shouldn't be dirty data.Since two different functions, tfw_http_msg_field_chunk_fixup()
(#328) and tfw_str_eq_cstr()
(this issues), fail at the same assertion while there were not crashes on other TfwStr members (like len
, skb
or ptr
) it seems that the bug affects flags
only... Moreover, #328 and this issue happen at different time and tests, so probability of improper build is actually small. KASAN also didn't find any issues.
One more interesting thing is that workload for the crash was pretty simple: just ab
or wrk
benchmarking. These tools send very simple requests without duplicate or long fields. The requests are:
GET / HTTP/1.0
Host: 172.16.0.5
User-Agent: ApacheBench/2.3
Accept: */*
GET / HTTP/1.1^M
Host: 172.16.0.5^M
for ab
and wrk
correspondingly.
Testing configuration (as supposed used in original tests):
server 127.0.0.1:8080;
cache 0;
sticky name=__cookie__ enforce;
frang_limits {
request_rate 100000;
request_burst 100000;
connection_rate 20000;
concurrent_connections 20000;
client_header_timeout 20;
client_body_timeout 10;
http_uri_len 1024;
http_field_len 256;
http_ct_required false;
http_methods get post head;
http_ct_vals "text/plain" "text/html";
http_header_chunk_cnt 10;
http_body_chunk_cnt 0;
http_header_cnt 16;
}
Now I add couple of checks to harden manipulations with chunks number in TfwStr.flags
and leave the issue open for further investigation.
The issue seems a memory corruption. We fixed many of memory corruptions since that time and didn't see the issue anymore.
kernel:Kernel panic - not syncing: Fatal exception in interrupt Oct 20 07:07:03 vip-1 kernel: kernel BUG at /root/tempesta/tempesta_fw/str.c:340! Oct 20 07:07:03 vip-1 kernel: RIP [] tfw_str_eq_cstr+0xf6/0x100 [tempesta_fw]
Oct 20 07:07:03 vip-1 kernel: ---[ end trace d6e1045fde362742 ]---
Oct 20 07:07:03 vip-1 kernel: Kernel panic - not syncing: Fatal exception in interrupt
Oct 20 07:07:03 vip-1 kernel: drm_kms_helper: panic occurred, switching back to text console
Oct 20 07:07:03 vip-1 kernel: ------------[ cut here ]------------
Oct 20 07:07:03 vip-1 kernel: WARNING: at kernel/rcutree.c:388 rcu_eqs_enter+0x8b/0xa0()
Oct 20 07:07:03 vip-1 kernel: ---[ end trace d6e1045fde362743 ]---
Oct 20 07:07:03 vip-1 kernel: ------------[ cut here ]------------
Oct 20 07:07:03 vip-1 kernel: WARNING: at kernel/rcutree.c:528 rcu_eqs_exit+0x89/0xa0()
Oct 20 07:07:03 vip-1 kernel: ---[ end trace d6e1045fde362744 ]---
[10:27:09 PM] zuer rong: system hung .