tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
613 stars 103 forks source link

Hang with use-after-free in tfw_tls_conn_send+0x66 #2169

Closed krizhanovsky closed 3 weeks ago

krizhanovsky commented 1 month ago

Run Tempesta FW in a 4 CPU VM with 8GB RAM of version #2162. Run taskset --cpu-list 6,7,8,9,10,11,12,13 h2load -n 1000000 -c 1024 -t 8 -m 50 https://tempesta-tech.com and on 60% of requests I got the KASAN report:

[ 3430.520348] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.521546] ==================================================================
[ 3430.522798] BUG: KASAN: use-after-free in tfw_tls_conn_send+0x66/0xe0 [tempesta_fw]
[ 3430.523807] Read of size 4 at addr ffff88814cfe5038 by task rs:main Q:Reg/10588
[ 3430.524916]
[ 3430.525508] CPU: 1 PID: 10588 Comm: rs:main Q:Reg Kdump: loaded Tainted: G           O      5.10.35-tfw-secperf+ #7
[ 3430.527046] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 3430.528468] Call Trace:
[ 3430.529382]  <IRQ>
[ 3430.530264]  dump_stack+0x7d/0xa3
[ 3430.531288]  print_address_description.constprop.0+0x1c/0x210
[ 3430.535477]  ? _raw_spin_lock_irqsave+0x6b/0xb0
[ 3430.536747]  ? _raw_spin_unlock_bh+0x20/0x20
[ 3430.537992]  ? tfw_tls_conn_send+0x66/0xe0 [tempesta_fw]
[ 3430.539391]  kasan_report.cold+0x1f/0x37
[ 3430.540686]  ? tfw_tls_conn_send+0x66/0xe0 [tempesta_fw]
[ 3430.542147]  tfw_tls_conn_send+0x66/0xe0 [tempesta_fw]
[ 3430.543630]  tfw_cli_conn_send+0x17/0x90 [tempesta_fw]
[ 3430.545166]  tfw_h2_resp_fwd+0x78/0x180 [tempesta_fw]
[ 3430.546677]  tfw_h2_send_err_resp+0xca/0xf0 [tempesta_fw]
[ 3430.548256]  ? tfw_h2_send_resp+0x90/0x90 [tempesta_fw]
[ 3430.549858]  ? ___ratelimit+0xf4/0x1b0
[ 3430.549926] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[ 3430.551345]  tfw_http_send_err_resp+0x9c/0x160 [tempesta_fw]
[ 3430.555767]  ? tfw_http_cli_error_resp_and_log+0x3b0/0x3b0 [tempesta_fw]
[ 3430.557760]  tfw_http_req_zap_error+0x86/0x1d0 [tempesta_fw]
[ 3430.559691]  tfw_http_popreq+0x40c/0xae0 [tempesta_fw]
[ 3430.561601]  ? frang_resp_handler+0x1a8/0x640 [tempesta_fw]
[ 3430.563617]  ? tfw_http_req_zap_error+0x1d0/0x1d0 [tempesta_fw]
[ 3430.565699]  ? tfw_http_parse_resp+0xaa2/0x1c430 [tempesta_fw]
[ 3430.567780]  ? ktime_get_real_ts64+0xdc/0x130
[ 3430.568767] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.569771]  tfw_http_resp_cache+0x104/0x320 [tempesta_fw]
[ 3430.572784] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.575076]  ? tfw_gfsm_move+0xfa/0x4e0 [tempesta_fw]
[ 3430.575098]  ? tfw_http_conn_repair+0x8f0/0x8f0 [tempesta_fw]
[ 3430.575122]  tfw_http_resp_process+0x64f/0x7d0 [tempesta_fw]
[ 3430.578762] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.581128]  ? tfw_http_conn_drop+0x2c0/0x2c0 [tempesta_fw]
[ 3430.581134]  ? __ip_finish_output+0x151/0x340
[ 3430.584004] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.586442]  ? pcpu_get_vm_areas+0x200/0x1a00
[ 3430.586445]  ? ip_output+0xfb/0x1b0
[ 3430.586446]  ? ip_mc_output+0x420/0x420
[ 3430.586469]  tfw_http_msg_process_generic+0x14f/0x330 [tempesta_fw]
[ 3430.586492]  ? tfw_http_req_process+0xc00/0xc00 [tempesta_fw]
[ 3430.589843] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.592423]  ? __ip_queue_xmit+0x365/0x7d0
[ 3430.595881] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.598928]  ? tfw_http_msg_process+0x20/0xa0 [tempesta_fw]
[ 3430.598952]  tfw_connection_recv+0x14f/0x200 [tempesta_fw]
[ 3430.602091] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.604709]  ? tfw_connection_send+0x60/0x60 [tempesta_fw]
[ 3430.604733]  ss_tcp_process_skb.constprop.0+0x29c/0x590 [tempesta_fw]
[ 3430.604758]  ? __sk_close_locked+0x110/0x110 [tempesta_fw]
[ 3430.604761]  ? __tcp_send_ack+0x189/0x2a0
[ 3430.604784]  ss_tcp_process_data+0x1ef/0x310 [tempesta_fw]
[ 3430.604807]  ? ss_tcp_process_skb.constprop.0+0x590/0x590 [tempesta_fw]
[ 3430.604809]  ? __tcp_select_window+0x155/0x390
[ 3430.604831]  ss_tcp_data_ready+0x8f/0x1a0 [tempesta_fw]
[ 3430.604833]  tcp_rcv_established+0x8fb/0xe40
[ 3430.604836]  ? tcp_data_queue+0xac0/0xac0
[ 3430.604839]  ? bpf_skb_adjust_room+0x2f0/0x310
[ 3430.604841]  tcp_v4_do_rcv+0x24a/0x340
[ 3430.604843]  tcp_v4_rcv+0x13ac/0x14b0
[ 3430.604845]  ? tcp_v4_early_demux+0x2e0/0x2e0
[ 3430.604848]  ip_protocol_deliver_rcu+0x39/0x2f0
[ 3430.604851]  ip_local_deliver_finish+0xac/0xd0
[ 3430.604853]  ip_local_deliver+0xe9/0x1a0
[ 3430.604855]  ? ip_local_deliver_finish+0xd0/0xd0
[ 3430.604857]  ? ip_rcv_finish_core.constprop.0+0x166/0x5b0
[ 3430.604859]  ip_rcv+0x12e/0x140
[ 3430.604861]  ? ip_local_deliver+0x1a0/0x1a0
[ 3430.604863]  ? ip_rcv_finish_core.constprop.0+0x5b0/0x5b0
[ 3430.604865]  ? ip_local_deliver+0x1a0/0x1a0
[ 3430.604868]  __netif_receive_skb_one_core+0xb9/0xf0
[ 3430.604870]  ? __netif_receive_skb_core+0x1750/0x1750
[ 3430.604873]  ? _raw_spin_lock+0x65/0xa0
[ 3430.604874]  ? _raw_read_lock_irq+0x30/0x30
[ 3430.604877]  ? nohz_balancer_kick+0x64/0x430
[ 3430.604879]  process_backlog+0xe3/0x270
[ 3430.604881]  napi_poll+0x112/0x3a0
[ 3430.604882]  net_rx_action+0x14f/0x300
[ 3430.604884]  ? napi_poll+0x3a0/0x3a0
[ 3430.604887]  ? __kernel_fpu_begin_mask+0x102/0x120
[ 3430.604890]  ? __list_add_valid+0x2b/0xa0
[ 3430.604892]  __do_softirq+0xf4/0x430
[ 3430.604895]  asm_call_irq_on_stack+0xf/0x20
[ 3430.604896]  </IRQ>
[ 3430.604899]  do_softirq_own_stack+0x32/0x40
[ 3430.604901]  irq_exit_rcu+0xbb/0x110
[ 3430.604904]  sysvec_call_function_single+0x2e/0x80
[ 3430.604906]  asm_sysvec_call_function_single+0x12/0x20
[ 3430.604910] RIP: 0010:__asan_load4+0x55/0x80
[ 3430.604913] Code: 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 04 08 84 c0 75 33 c3 48 b9 00 00 00 00 00 fc ff df 48 89 fe 48 c1 ee 03 80 3c 0e 00 <74> 0f 4c 89 c1 31 d2 be 04 00 00 00 e9 8a fb ff ff 48 c1 e8 03 0f
[ 3430.604915] RSP: 0018:ffff8880326d7598 EFLAGS: 00000246
[ 3430.604918] RAX: ffffffffc0979590 RBX: ffffffffc08ccf68 RCX: dffffc0000000000
[ 3430.604919] RDX: 0000000000000000 RSI: 1ffffffff812f2b1 RDI: ffffffffc097958d
[ 3430.604921] RBP: ffffffffc096ffd9 R08: ffffffffa9c88483 R09: ffffffffc098b5ac
[ 3430.604922] R10: fffffbfff81316b5 R11: 0000000000000001 R12: ffffffffc096ffd9
[ 3430.604923] R13: ffffffffc0982b45 R14: ffffffffc097958d R15: ffffffffc096ffd9
[ 3430.604956]  ? ext4_da_write_begin+0x398/0x5d0 [ext4]
[ 3430.604959]  ? __orc_find+0x63/0xc0
[ 3430.604961]  __orc_find+0x63/0xc0
[ 3430.604963]  unwind_next_frame+0x107/0x8d0
[ 3430.604994]  ? ext4_da_write_begin+0x399/0x5d0 [ext4]
[ 3430.604997]  ? __module_address+0x141/0x1d0
[ 3430.604998]  ? deref_stack_reg+0xb0/0xb0
[ 3430.605000]  ? is_module_text_address+0x76/0xa0
[ 3430.605030]  ? ext4_da_write_begin+0x399/0x5d0 [ext4]
[ 3430.605061]  ? ext4_da_write_begin+0x399/0x5d0 [ext4]
[ 3430.605063]  ? kernel_text_address+0x40/0x120
[ 3430.605065]  ? stack_trace_save+0xa0/0xa0
[ 3430.605067]  arch_stack_walk+0x61/0xb0
[ 3430.605098]  ? ext4_da_write_begin+0x399/0x5d0 [ext4]
[ 3430.605100]  stack_trace_save+0x7c/0xa0
[ 3430.605102]  ? create_prof_cpu_mask+0x20/0x20
[ 3430.605104]  kasan_save_stack+0x1b/0x40
[ 3430.605106]  ? kasan_save_stack+0x1b/0x40
[ 3430.605107]  ? __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 3430.605109]  ? kmem_cache_alloc+0xcd/0x220
[ 3430.605118]  ? jbd2__journal_start+0xe9/0x340 [jbd2]
[ 3430.605147]  ? __ext4_journal_start_sb+0x181/0x1a0 [ext4]
[ 3430.605177]  ? ext4_truncate+0x267/0x6f0 [ext4]
[ 3430.605208]  ? ext4_da_write_begin+0x399/0x5d0 [ext4]
[ 3430.605210]  ? stack_trace_save+0xa0/0xa0
[ 3430.605211]  ? arch_stack_walk+0x71/0xb0
[ 3430.605213]  ? _raw_spin_lock+0x65/0xa0
[ 3430.605214]  ? _raw_read_lock_irq+0x30/0x30
[ 3430.605216]  ? __list_add_valid+0x2b/0xa0
[ 3430.605218]  ? deactivate_slab+0x459/0x660
[ 3430.605226]  ? jbd2__journal_start+0xe9/0x340 [jbd2]
[ 3430.605227]  ? set_track+0x4b/0xc0
[ 3430.605229]  ? init_object+0x49/0x80
[ 3430.605231]  ? alloc_debug_processing+0x42/0x160
[ 3430.605233]  ? ___slab_alloc+0x239/0x470
[ 3430.605241]  ? jbd2__journal_start+0xe9/0x340 [jbd2]
[ 3430.605242]  ? kasan_unpoison_shadow+0x34/0x40
[ 3430.605244]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[ 3430.605246]  kmem_cache_alloc+0xcd/0x220
[ 3430.605254]  jbd2__journal_start+0xe9/0x340 [jbd2]
[ 3430.605285]  __ext4_journal_start_sb+0x181/0x1a0 [ext4]
[ 3430.605316]  ext4_truncate+0x267/0x6f0 [ext4]
[ 3430.605347]  ? ext4_punch_hole+0x780/0x780 [ext4]
[ 3430.605349]  ? down_read_killable+0x180/0x180
[ 3430.605380]  ext4_da_write_begin+0x399/0x5d0 [ext4]
[ 3430.605411]  ? ext4_write_begin+0x790/0x790 [ext4]
[ 3430.605412]  ? ktime_get_coarse_real_ts64+0x4a/0x90
[ 3430.605415]  generic_perform_write+0x146/0x290
[ 3430.605417]  ? add_page_wait_queue+0xf0/0xf0
[ 3430.605419]  ? current_time+0x160/0x160
[ 3430.605450]  ext4_buffered_write_iter+0x129/0x220 [ext4]
[ 3430.605453]  new_sync_write+0x22b/0x330
[ 3430.605455]  ? new_sync_read+0x320/0x320
[ 3430.605457]  ? __fsnotify_parent+0x398/0x4b0
[ 3430.605459]  ? rw_verify_area+0x51/0x120
[ 3430.605461]  vfs_write+0x2ae/0x350
[ 3430.605463]  ksys_write+0xa0/0x120
[ 3430.605465]  ? __ia32_sys_read+0x40/0x40
[ 3430.605467]  ? fpregs_assert_state_consistent+0x4d/0x60
[ 3430.605468]  do_syscall_64+0x2d/0x70
[ 3430.605470]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3430.605472] RIP: 0033:0x7f538af288bf
[ 3430.605476] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 49 c1 f7 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 8c c1 f7 ff 48
[ 3430.605477] RSP: 002b:00007f5389a51740 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[ 3430.605480] RAX: ffffffffffffffda RBX: 00007f537400b8b0 RCX: 00007f538af288bf
[ 3430.605481] RDX: 0000000000000087 RSI: 00007f537400bb90 RDI: 0000000000000008
[ 3430.605482] RBP: 00007f537400bb90 R08: 0000000000000000 R09: 0000000000000000
[ 3430.605483] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[ 3430.605485] R13: 000056002c54b4cc R14: 0000000000000087 R15: 00007f537400b8b0
[ 3430.605486]
[ 3430.605487] The buggy address belongs to the page:
[ 3430.605491] page:00000000cf53bb5b refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14cfe5
[ 3430.605492] flags: 0x2fff80000000000()
[ 3430.605496] raw: 02fff80000000000 ffffea00054dbb48 ffffea0004f23648 0000000000000000
[ 3430.605498] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 3430.605499] page dumped because: kasan: bad access detected
[ 3430.605500]
[ 3430.605500] Memory state around the buggy address:
[ 3430.605503]  ffff88814cfe4f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3430.605504]  ffff88814cfe4f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3430.605506] >ffff88814cfe5000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3430.605507]                                         ^
[ 3430.605508]  ffff88814cfe5080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3430.605510]  ffff88814cfe5100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 3430.605510] ==================================================================
[ 3430.605512] Disabling lock debugging due to kernel taint
[ 3430.605545] BUG: unable to handle page fault for address: ffff88814cfe5038
[ 3430.605547] #PF: supervisor read access in kernel mode
[ 3430.605548] #PF: error_code(0x0000) - not-present page
[ 3430.605549] PGD 1bca01067 P4D 1bca01067 PUD 23f705067 PMD 23f69d067 PTE 800ffffeb301a060
[ 3430.605554] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
[ 3430.605557] CPU: 1 PID: 10588 Comm: rs:main Q:Reg Kdump: loaded Tainted: G    B      O      5.10.35-tfw-secperf+ #7
[ 3430.605559] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 3430.605580] RIP: 0010:tfw_tls_conn_send+0x66/0xe0 [tempesta_fw]
[ 3430.605581] Code: e8 3f 7b d3 e8 44 89 e2 48 8b bd 40 04 00 00 48 8d 73 10 83 e2 db e8 19 cb fe ff 41 89 c4 85 c0 75 51 4c 89 ef e8 1a 7a d3 e8 <8b> 43 18 41 89 c4 41 83 e4 04 74 3d 83 e0 20 4c 8d ad b8 04 00 00
[ 3430.605583] RSP: 0018:ffff8881f6489000 EFLAGS: 00010282
[ 3430.605584] RAX: 0000000000000001 RBX: ffff88814cfe5020 RCX: ffffffffaa930bad
[ 3430.605585] RDX: 1ffffffff595e9f0 RSI: 0000000000000286 RDI: ffffffffacaf4f80
[ 3430.605587] RBP: ffff888140083910 R08: 0000000000000001 R09: ffffffffacab5d67
[ 3430.605588] R10: fffffbfff5956bac R11: 0000000000000001 R12: 0000000000000000
[ 3430.605589] R13: ffff88814cfe5038 R14: ffff888140083dc0 R15: ffff88812f7e5250
[ 3430.605593] FS:  00007f5389a52640(0000) GS:ffff8881f6480000(0000) knlGS:0000000000000000
[ 3430.605594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3430.605595] CR2: ffff88814cfe5038 CR3: 0000000125dd8000 CR4: 0000000000750ee0
[ 3430.605596] PKRU: 55555554
[ 3430.605600] Call Trace:
[ 3430.605602]  <IRQ>
[ 3430.605623]  tfw_cli_conn_send+0x17/0x90 [tempesta_fw]
[ 3430.605642]  tfw_h2_resp_fwd+0x78/0x180 [tempesta_fw]
[ 3430.605660]  tfw_h2_send_err_resp+0xca/0xf0 [tempesta_fw]
[ 3430.605682]  ? tfw_h2_send_resp+0x90/0x90 [tempesta_fw]
[ 3430.608659] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.611174]  ? ___ratelimit+0xf4/0x1b0
[ 3430.611199]  tfw_http_send_err_resp+0x9c/0x160 [tempesta_fw]
[ 3430.615415] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.618707]  ? tfw_http_cli_error_resp_and_log+0x3b0/0x3b0 [tempesta_fw]
[ 3430.618731]  tfw_http_req_zap_error+0x86/0x1d0 [tempesta_fw]
[ 3430.622044] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.625434]  tfw_http_popreq+0x40c/0xae0 [tempesta_fw]
[ 3430.625456]  ? frang_resp_handler+0x1a8/0x640 [tempesta_fw]
[ 3430.625480]  ? tfw_http_req_zap_error+0x1d0/0x1d0 [tempesta_fw]
[ 3430.629003] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.631984]  ? tfw_http_parse_resp+0xaa2/0x1c430 [tempesta_fw]
[ 3430.631990]  ? ktime_get_real_ts64+0xdc/0x130
[ 3430.636330] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 504 0 "-" "h2load nghttp2/1.43.0"
[ 3430.639826]  tfw_http_resp_cache+0x104/0x320 [tempesta_fw]
[ 3430.639851]  ? tfw_gfsm_move+0xfa/0x4e0 [tempesta_fw]
[ 3430.643668] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[ 3430.646659]  ? tfw_http_conn_repair+0x8f0/0x8f0 [tempesta_fw]
[ 3430.646682]  tfw_http_resp_process+0x64f/0x7d0 [tempesta_fw]
[ 3430.646705]  ? tfw_http_conn_drop+0x2c0/0x2c0 [tempesta_fw]
[ 3430.651616] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[ 3430.651723] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[ 3430.654014]  ? __ip_finish_output+0x151/0x340
[ 3430.654022]  ? pcpu_get_vm_areas+0x200/0x1a00
[ 3430.692682] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[ 3430.693188]  ? ip_output+0xfb/0x1b0
[ 3430.734766] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[ 3430.738804]  ? ip_mc_output+0x420/0x420
[ 3430.738831]  tfw_http_msg_process_generic+0x14f/0x330 [tempesta_fw]
[ 3430.817926] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"
[ 3430.820181]  ? tfw_http_req_process+0xc00/0xc00 [tempesta_fw]
[ 3430.989798] [tempesta fw] 192.168.100.1 "default" "GET / HTTP/2.0" 200 0 "-" "h2load nghttp2/1.43.0"

The system hangs.

The kernel (5.10.35 as of09f6e2a043d099627d0617dbe34c2d1521540927 ) config is in the attachment.

tfw_tls_conn_send+0x66 corresponds to

static int
tfw_tls_conn_send(TfwConn *c, TfwMsg *msg)
{
        int r;
        TlsCtx *tls = tfw_tls_context(c);

        T_DBG("TLS %lu bytes (%u bytes)"
              " are to be sent on conn=%pK/sk_write_xmit=%pK ready=%d\n",
              msg->len, tls->io_out.msglen + TLS_HEADER_SIZE, c,
              c->sk->sk_write_xmit, ttls_xfrm_ready(tls));

        if (ttls_xfrm_ready(tls)) {
                msg->ss_flags |= SS_SKB_TYPE2F(TTLS_MSG_APPLICATION_DATA) |
                        SS_F_ENCRYPT;
        }

        r = ss_send(c->sk, &msg->skb_head,
                    msg->ss_flags & ~SS_F_CLOSE_FORCE);
        if (r)
                return r;

        /*
         * We can not send the alert on conn_drop hook, because the hook
         * is called on already closed socket.
         */
        if (msg->ss_flags & SS_F_CONN_CLOSE) {            // <===== tfw_tls_conn_send+0x66
                int close_type = msg->ss_flags & __SS_F_FORCE ?
                        TTLS_F_ST_CLOSE : TTLS_F_ST_SHUTDOWN;

                spin_lock(&tls->lock);
                r = ttls_close_notify(tls, close_type);
                spin_unlock(&tls->lock);
        }

        return r;
}

linux-config.txt

Our website LXC container is used as a backend and Tempesta FW config is

cache 2;
cache_fulfill * *;
cache_methods GET HEAD;
cache_ttl 3600;

listen 192.168.100.4:443 proto=h2,https;
#listen 192.168.100.4:80;

srv_group default {
        server 10.245.18.108:80;
    #server 127.0.0.1:80;
}

tls_certificate /root/tempesta/etc/tfw-root.crt;
tls_certificate_key /root/tempesta/etc/tfw-root.key;

#tls_certificate /home/tempesta/fullchain.pem;
#tls_certificate_key /home/tempesta/privkey.pem;
req_hdr_add X-Forwarded-Proto "https";
resp_hdr_set Strict-Transport-Security "max-age=31536000; includeSubDomains";

vhost default {
        tls_match_any_server_name;
        resp_hdr_set Cache-Control;
        nonidempotent GET * *;
        nonidempotent HEAD * *;
        nonidempotent POST * *;
        proxy_pass default;
}

frang_limits {
#        request_rate 200;
        http_method_override_allowed true;
        http_methods post put get;
}

#cache_purge;
#cache_purge_acl 3.72.94.204 127.0.0.1;

access_log on;

block_action attack reply;
block_action error reply;

http_chain {

        # Set cache_ttl for speciefic locations
        uri == "/blog/*" -> cache_ttl = 1;
        uri == "/knowledge-base/*" -> cache_ttl = 600;

        # Set Security
#        mark == 23 -> default;
        -> default;
}
kingluo commented 1 month ago

Found another suspicious KASAN bug:

[ 9877.219407] [tdb] Close table 'client0.tdb'
[ 9877.256909] [tdb] Close table 'sessions0.tdb'
[ 9877.344305] ==================================================================
[ 9877.346328] BUG: KASAN: use-after-free in _raw_spin_lock+0x6d/0xd0
[ 9877.347890] Write of size 4 at addr ffff88810349e318 by task apache2/24588

[ 9877.351382] CPU: 3 PID: 24588 Comm: apache2 Kdump: loaded Tainted: G           OE     5.10.35+ #1
[ 9877.353120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 9877.354943] Call Trace:
[ 9877.356671]  <IRQ>
[ 9877.358487]  dump_stack+0x96/0xc4
[ 9877.360693]  print_address_description.constprop.0+0x21/0x220
[ 9877.362799]  ? _raw_spin_lock_irqsave+0x8e/0xf0
[ 9877.364630]  ? _raw_write_unlock_bh+0x30/0x30
[ 9877.366658]  ? __kernel_text_address+0x12/0x40
[ 9877.368481]  ? unwind_get_return_address+0x36/0x50
[ 9877.370231]  ? _raw_spin_lock+0x6d/0xd0
[ 9877.372079]  kasan_report.cold+0x20/0x37
[ 9877.373819]  ? _raw_spin_lock+0x6d/0xd0
[ 9877.375573]  check_memory_region+0x14d/0x1a0
[ 9877.377384]  __kasan_check_write+0x14/0x20
[ 9877.379092]  _raw_spin_lock+0x6d/0xd0
[ 9877.380781]  ? _raw_write_lock_irqsave+0xe0/0xe0
[ 9877.382589]  tdb_rec_get_alloc+0x95/0x210 [tempesta_db]
[ 9877.384449]  ? sk_prot_alloc+0x4c/0x190
[ 9877.386106]  ? sk_clone_lock+0x47/0x880
[ 9877.387724]  ? tdb_close+0x70/0x70 [tempesta_db]
[ 9877.389390]  ? ip6_input+0x15f/0x170
[ 9877.391026]  ? ipv6_rcv+0x236/0x250
[ 9877.392798]  ? __netif_receive_skb_one_core+0xeb/0x140
[ 9877.394445]  ? __netif_receive_skb+0x26/0xb0
[ 9877.396035]  ? __hash_calc+0x90/0x190 [tempesta_lib]
[ 9877.397667]  tfw_client_obtain+0x266/0x360 [tempesta_fw]
[ 9877.399380]  ? tfw_client_start+0x50/0x50 [tempesta_fw]
[ 9877.401174]  ? __tcp_push_pending_frames+0x5f/0x1a0
[ 9877.402732]  ? tcp_push+0x209/0x2b0
[ 9877.404274]  ? tcp_sendmsg_locked+0x13e3/0x15e0
[ 9877.405808]  ? _raw_spin_lock_irqsave+0x8e/0xf0
[ 9877.407333]  ? tfw_client_obtain+0x360/0x360 [tempesta_fw]
[ 9877.408981]  ? tfw_cfg_exit+0x10/0x10 [tempesta_fw]
[ 9877.410534]  ? enqueue_timer+0xbd/0x1c0
[ 9877.412146]  ? queued_spin_unlock+0x10/0x10 [tempesta_fw]
[ 9877.413666]  ? __ipv6_addr_type+0xa5/0x140
[ 9877.415134]  tfw_classify_conn_estab+0x17e/0x430 [tempesta_fw]
[ 9877.416579]  ? __hrtimer_init+0xce/0x100
[ 9877.418017]  ? frang_conn_limit+0x420/0x420 [tempesta_fw]
[ 9877.419418]  ? memset+0x3c/0x50
[ 9877.421032]  tempesta_new_clntsk+0x4d/0x80
[ 9877.422526]  tcp_v6_syn_recv_sock+0x743/0xed0
[ 9877.424115]  ? tcp_v6_do_rcv+0x780/0x780
[ 9877.425537]  ? kfree_skbmem+0x4b/0x200
[ 9877.426906]  ? tcp_parse_options+0x444/0x650
[ 9877.428250]  tcp_check_req+0x27b/0x940
[ 9877.429571]  ? tcp_child_process+0x2e0/0x2e0
[ 9877.430875]  ? memmove+0x4e/0x60
[ 9877.432264]  tcp_v6_rcv+0x10a1/0x1660
[ 9877.433593]  ? tdb_init_mappings+0x151/0x190 [tempesta_db]
[ 9877.434820]  ? tcp_v6_syn_recv_sock+0xed0/0xed0
[ 9877.436016]  ip6_protocol_deliver_rcu+0x18b/0x880
[ 9877.437228]  ip6_input+0x15f/0x170
[ 9877.438449]  ? ip6_input_finish+0x20/0x20
[ 9877.439863]  ? tfw_ipv6_nf_hook+0x1a3/0x230 [tempesta_fw]
[ 9877.441403]  ? nf_hook_slow+0xb3/0xe0
[ 9877.442614]  ipv6_rcv+0x236/0x250
[ 9877.443819]  ? ip6_rcv_finish+0xf0/0xf0
[ 9877.445078]  ? ip6_sublist_rcv+0x530/0x530
[ 9877.446310]  ? try_to_wake_up+0x41e/0xa70
[ 9877.447560]  ? ip6_rcv_finish+0xf0/0xf0
[ 9877.448793]  __netif_receive_skb_one_core+0xeb/0x140
[ 9877.450058]  ? napi_gro_receive+0x260/0x260
[ 9877.451656]  ? __kasan_check_write+0x14/0x20
[ 9877.452891]  ? _raw_spin_lock+0x7b/0xd0
[ 9877.454158]  __netif_receive_skb+0x26/0xb0
[ 9877.456574]  process_backlog+0xfe/0x290
[ 9877.457778]  net_rx_action+0x287/0x6b0
[ 9877.459018]  ? napi_complete_done+0x2c0/0x2c0
[ 9877.460295]  ? switch_fpu_return+0x120/0x120
[ 9877.461919]  __do_softirq+0xfe/0x383
[ 9877.463115]  asm_call_irq_on_stack+0x12/0x20
[ 9877.464308]  </IRQ>
[ 9877.465593]  do_softirq_own_stack+0x3d/0x50
[ 9877.466886]  do_softirq+0x72/0x90
[ 9877.468081]  __local_bh_enable_ip+0x50/0x60
[ 9877.469391]  ip6_finish_output2+0x593/0xaa0
[ 9877.470914]  ? ip6_fraglist_init+0x360/0x360
[ 9877.472343]  ? cgroup_bpf_release+0x400/0x400
[ 9877.473685]  ? loopback_xmit+0x15e/0x1b0
[ 9877.474858]  __ip6_finish_output+0x20e/0x380
[ 9877.476086]  ip6_finish_output+0x2d/0xe0
[ 9877.477264]  ip6_output+0x136/0x260
[ 9877.478420]  ? ip6_finish_output+0xe0/0xe0
[ 9877.479604]  ? netdev_core_pick_tx+0x120/0x120
[ 9877.480772]  ? cgroup_bpf_release+0x400/0x400
[ 9877.482073]  ip6_xmit+0x5d9/0xad0
[ 9877.483185]  ? ip6_finish_output2+0xaa0/0xaa0
[ 9877.484298]  ? ip6_dst_check+0xdd/0x1f0
[ 9877.485398]  ? __sk_dst_check+0x6d/0xd0
[ 9877.486492]  ? inet6_csk_route_socket+0x29e/0x3e0
[ 9877.487661]  ? inet6_csk_addr2sockaddr+0xd0/0xd0
[ 9877.488790]  ? __ip6_finish_output+0x20e/0x380
[ 9877.489892]  inet6_csk_xmit+0x17e/0x220
[ 9877.490988]  ? inet6_csk_route_socket+0x3e0/0x3e0
[ 9877.492118]  ? __asan_loadN+0xf/0x20
[ 9877.493221]  ? __tcp_select_window+0x147/0x370
[ 9877.494330]  __tcp_transmit_skb+0xb81/0x1730
[ 9877.495445]  ? __tcp_select_window+0x370/0x370
[ 9877.496553]  ? __kasan_check_write+0x14/0x20
[ 9877.497676]  ? _raw_write_unlock_bh+0x30/0x30
[ 9877.498768]  ? tcp_tfw_sk_write_xmit+0x2b/0x1d0
[ 9877.499861]  tcp_write_xmit+0x796/0x2860
[ 9877.500987]  ? iov_iter_advance+0x12b/0x5f0
[ 9877.502127]  ? bpf_skops_hdr_opt_len+0x210/0x210
[ 9877.503288]  __tcp_push_pending_frames+0x5f/0x1a0
[ 9877.504384]  tcp_push+0x209/0x2b0
[ 9877.505474]  tcp_sendmsg_locked+0x13e3/0x15e0
[ 9877.506579]  ? skb_do_copy_data_nocache+0x1b0/0x1b0
[ 9877.507690]  ? __kasan_check_write+0x14/0x20
[ 9877.508808]  ? _raw_spin_lock_bh+0x86/0xe0
[ 9877.509913]  tcp_sendmsg+0x2d/0x50
[ 9877.511139]  inet6_sendmsg+0x64/0xa0
[ 9877.512235]  ? inet6_ioctl+0x1b0/0x1b0
[ 9877.513331]  sock_sendmsg+0x80/0xa0
[ 9877.514424]  sock_write_iter+0x14c/0x200
[ 9877.515680]  ? sock_sendmsg+0xa0/0xa0
[ 9877.516771]  ? sk_get_meminfo+0x1a0/0x1a0
[ 9877.517857]  ? ptep_set_access_flags+0xd0/0xe0
[ 9877.518972]  ? pgd_free+0x130/0x130
[ 9877.520156]  new_sync_write+0x3aa/0x3c0
[ 9877.521242]  ? new_sync_read+0x3c0/0x3c0
[ 9877.522324]  ? __cgroup_bpf_run_filter_getsockopt+0x172/0x470
[ 9877.523511]  ? __seccomp_filter+0x12e/0xad0
[ 9877.524636]  ? rw_verify_area+0x7c/0x150
[ 9877.525743]  vfs_write+0x311/0x3d0
[ 9877.526846]  ksys_write+0x138/0x170
[ 9877.527940]  ? __ia32_sys_read+0x50/0x50
[ 9877.529043]  ? __secure_computing+0x74/0x150
[ 9877.530147]  __x64_sys_write+0x42/0x50
[ 9877.531350]  do_syscall_64+0x38/0x90
[ 9877.532456]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 9877.533620] RIP: 0033:0x7f3a02fc3887
[ 9877.534735] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77
51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 9877.537253] RSP: 002b:00007ffc0d84f728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 9877.538623] RAX: ffffffffffffffda RBX: 00007f39ffee30a0 RCX: 00007f3a02fc3887
[ 9877.539966] RDX: 0000000000000064 RSI: 00007f39ffee3290 RDI: 000000000000000d
[ 9877.541282] RBP: 00007ffc0d84f760 R08: 0000000000000000 R09: 00007f3a02e139d8
[ 9877.542611] R10: 00007ffc0d84f714 R11: 0000000000000246 R12: 00007f39ffee3290
[ 9877.543917] R13: 00007f3a02d64048 R14: 000055fe2b1070f0 R15: 0000000000000000

[ 9877.546558] Allocated by task 24463:
[ 9877.547933]  kasan_save_stack+0x23/0x50
[ 9877.549235]  __kasan_kmalloc.constprop.0+0xcf/0xe0
[ 9877.550551]  kasan_kmalloc+0x9/0x10
[ 9877.551918]  kmem_cache_alloc_trace+0x132/0x240
[ 9877.553226]  tdb_open+0x300/0x390 [tempesta_db]
[ 9877.554558]  tfw_client_start+0x34/0x50 [tempesta_fw]
[ 9877.555908]  tfw_ctlfn_state_io+0x3d5/0x4d0 [tempesta_fw]
[ 9877.557287]  proc_sys_call_handler+0x227/0x390
[ 9877.558620]  proc_sys_write+0x13/0x20
[ 9877.559941]  new_sync_write+0x29d/0x3c0
[ 9877.561272]  vfs_write+0x311/0x3d0
[ 9877.562580]  ksys_write+0xcd/0x170
[ 9877.563877]  __x64_sys_write+0x42/0x50
[ 9877.565172]  do_syscall_64+0x38/0x90
[ 9877.566464]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 9877.569113] Freed by task 26066:
[ 9877.570441]  kasan_save_stack+0x23/0x50
[ 9877.571768]  kasan_set_track+0x20/0x30
[ 9877.573109]  kasan_set_free_info+0x1f/0x40
[ 9877.574401]  __kasan_slab_free+0x116/0x160
[ 9877.575691]  kasan_slab_free+0xe/0x10
[ 9877.577029]  slab_free_freelist_hook+0x6d/0x120
[ 9877.578349]  kfree+0xbe/0x460
[ 9877.579647]  tdb_close+0x69/0x70 [tempesta_db]
[ 9877.580990]  tfw_client_stop+0x23/0x30 [tempesta_fw]
[ 9877.582343]  tfw_mods_stop+0x51/0xe0 [tempesta_fw]
[ 9877.583736]  tfw_ctlfn_state_io+0x24a/0x4d0 [tempesta_fw]
[ 9877.585068]  proc_sys_call_handler+0x227/0x390
[ 9877.586413]  proc_sys_write+0x13/0x20
[ 9877.587737]  new_sync_write+0x29d/0x3c0
[ 9877.589036]  vfs_write+0x311/0x3d0
[ 9877.590331]  ksys_write+0xcd/0x170
[ 9877.591623]  __x64_sys_write+0x42/0x50
[ 9877.592919]  do_syscall_64+0x38/0x90
[ 9877.594213]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 9877.596865] The buggy address belongs to the object at ffff88810349e300
                which belongs to the cache kmalloc-192 of size 192
[ 9877.599694] The buggy address is located 24 bytes inside of
                192-byte region [ffff88810349e300, ffff88810349e3c0)
[ 9877.602303] The buggy address belongs to the page:
[ 9877.603605] page:00000000b431c515 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88810349e600 pfn:0x10349e
[ 9877.604957] flags: 0x17ffffc0000200(slab)
[ 9877.606217] raw: 0017ffffc0000200 ffffea00041ee588 ffffea000415a048 ffff888100043540
[ 9877.607553] raw: ffff88810349e600 000000000010000d 00000001ffffffff ffff88810ab3cc01
[ 9877.608876] page dumped because: kasan: bad access detected
[ 9877.610192] page->mem_cgroup:ffff88810ab3cc01

[ 9877.612805] Memory state around the buggy address:
[ 9877.614144]  ffff88810349e200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 9877.615553]  ffff88810349e280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 9877.616918] >ffff88810349e300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 9877.618236]                             ^
[ 9877.619544]  ffff88810349e380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 9877.620900]  ffff88810349e400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 9877.622259] ==================================================================
[ 9877.623627] Disabling lock debugging due to kernel taint
[ 9877.625243] BUG: unable to handle page fault for address: ffffed2da0420f9e
[ 9877.626408] #PF: supervisor read access in kernel mode
[ 9877.627535] #PF: error_code(0x0000) - not-present page
[ 9877.628887] PGD 13ffc9067 P4D 13ffc9067 PUD 0
[ 9877.630364] Oops: 0000 [#1] SMP KASAN NOPTI
[ 9877.631867] CPU: 3 PID: 24588 Comm: apache2 Kdump: loaded Tainted: G    B      OE     5.10.35+ #1
[ 9877.633383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 9877.634964] RIP: 0010:check_memory_region+0xe3/0x1a0
[ 9877.636552] Code: 74 f2 41 bb 01 00 00 00 48 85 c0 75 5d 5b 44 89 d8 41 5c 41 5d 5d c3 48 85 d2 74 60 4c 01 e2 eb 09 48 83 c0 01 48 39 d0 74 52 <80> 38 00 74 f2 eb d1
41 bd 08 00 00 00 45 29 dd 4b 8d 54 25 00 eb
[ 9877.639969] RSP: 0018:ffff88810c9891b8 EFLAGS: 00010286
[ 9877.641744] RAX: ffffed2da0420f9e RBX: ffffed2da0420f9f RCX: ffffffffc1600aad
[ 9877.643519] RDX: ffffed2da0420f9f RSI: 0000000000000004 RDI: ffff896d02107cf0
[ 9877.645317] RBP: ffff88810c9891d0 R08: 0000000000000000 R09: ffff896d02107cf3
[ 9877.647169] R10: ffffed2da0420f9e R11: 0000000000000001 R12: ffffed2da0420f9e
[ 9877.648953] R13: ffff88810c989270 R14: ffff8880020f5e70 R15: ffff896d02107cf0
[ 9877.650728] FS:  00007f3a02e36780(0000) GS:ffff88810c980000(0000) knlGS:0000000000000000
[ 9877.652588] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9877.654393] CR2: ffffed2da0420f9e CR3: 0000000102b3c000 CR4: 0000000000350ee0
[ 9877.656268] Call Trace:
[ 9877.658128]  <IRQ>
[ 9877.659999]  __asan_loadN+0xf/0x20
[ 9877.661875]  tdb_htrie_descend+0x6d/0x130 [tempesta_db]
[ 9877.663746]  tdb_htrie_lookup+0xbe/0x100 [tempesta_db]
[ 9877.665601]  ? tdb_htrie_insert+0x790/0x790 [tempesta_db]
[ 9877.667475]  ? _raw_spin_lock+0x6d/0xd0
[ 9877.669372]  tdb_rec_get+0x85/0xf0 [tempesta_db]
[ 9877.671258]  ? tdb_entry_get_room+0xb0/0xb0 [tempesta_db]
[ 9877.673123]  ? _raw_write_lock_irqsave+0xe0/0xe0
[ 9877.674980]  tdb_rec_get_alloc+0xb5/0x210 [tempesta_db]
[ 9877.676852]  ? sk_prot_alloc+0x4c/0x190
[ 9877.678688]  ? tdb_close+0x70/0x70 [tempesta_db]
[ 9877.680560]  ? ip6_input+0x15f/0x170
[ 9877.682387]  ? ipv6_rcv+0x236/0x250
[ 9877.684204]  ? __netif_receive_skb_one_core+0xeb/0x140
[ 9877.685942]  ? __netif_receive_skb+0x26/0xb0
[ 9877.687632]  ? __hash_calc+0x90/0x190 [tempesta_lib]
[ 9877.689449]  tfw_client_obtain+0x266/0x360 [tempesta_fw]
[ 9877.691179]  ? tfw_client_start+0x50/0x50 [tempesta_fw]
[ 9877.692851]  ? __tcp_push_pending_frames+0x5f/0x1a0
[ 9877.694519]  ? tcp_push+0x209/0x2b0
[ 9877.696285]  ? tcp_sendmsg_locked+0x13e3/0x15e0
[ 9877.697937]  ? _raw_spin_lock_irqsave+0x8e/0xf0
[ 9877.699636]  ? tfw_client_obtain+0x360/0x360 [tempesta_fw]
[ 9877.701432]  ? tfw_cfg_exit+0x10/0x10 [tempesta_fw]
[ 9877.703094]  ? enqueue_timer+0xbd/0x1c0
[ 9877.704750]  ? queued_spin_unlock+0x10/0x10 [tempesta_fw]
[ 9877.706402]  ? __ipv6_addr_type+0xa5/0x140
[ 9877.708045]  tfw_classify_conn_estab+0x17e/0x430 [tempesta_fw]
[ 9877.709671]  ? __hrtimer_init+0xce/0x100
[ 9877.711450]  ? frang_conn_limit+0x420/0x420 [tempesta_fw]
[ 9877.713061]  ? memset+0x3c/0x50
[ 9877.714658]  tempesta_new_clntsk+0x4d/0x80
[ 9877.716252]  tcp_v6_syn_recv_sock+0x743/0xed0
[ 9877.717877]  ? tcp_v6_do_rcv+0x780/0x780
[ 9877.719485]  ? kfree_skbmem+0x4b/0x200
[ 9877.721279]  ? tcp_parse_options+0x444/0x650
[ 9877.722811]  tcp_check_req+0x27b/0x940
[ 9877.724309]  ? tcp_child_process+0x2e0/0x2e0
[ 9877.725677]  ? memmove+0x4e/0x60
[ 9877.727097]  tcp_v6_rcv+0x10a1/0x1660
[ 9877.728496]  ? tdb_init_mappings+0x151/0x190 [tempesta_db]
[ 9877.729861]  ? tcp_v6_syn_recv_sock+0xed0/0xed0
[ 9877.731295]  ip6_protocol_deliver_rcu+0x18b/0x880
[ 9877.732676]  ip6_input+0x15f/0x170
[ 9877.734050]  ? ip6_input_finish+0x20/0x20
[ 9877.735396]  ? tfw_ipv6_nf_hook+0x1a3/0x230 [tempesta_fw]
[ 9877.736704]  ? nf_hook_slow+0xb3/0xe0
[ 9877.737966]  ipv6_rcv+0x236/0x250
[ 9877.739201]  ? ip6_rcv_finish+0xf0/0xf0
[ 9877.740448]  ? ip6_sublist_rcv+0x530/0x530
[ 9877.741700]  ? try_to_wake_up+0x41e/0xa70
[ 9877.743382]  ? ip6_rcv_finish+0xf0/0xf0
[ 9877.744929]  __netif_receive_skb_one_core+0xeb/0x140
[ 9877.746348]  ? napi_gro_receive+0x260/0x260
[ 9877.747582]  ? __kasan_check_write+0x14/0x20
[ 9877.748814]  ? _raw_spin_lock+0x7b/0xd0
[ 9877.750069]  __netif_receive_skb+0x26/0xb0
[ 9877.751282]  process_backlog+0xfe/0x290
[ 9877.752483]  net_rx_action+0x287/0x6b0
[ 9877.753673]  ? napi_complete_done+0x2c0/0x2c0
[ 9877.754867]  ? switch_fpu_return+0x120/0x120
[ 9877.756058]  __do_softirq+0xfe/0x383
[ 9877.757251]  asm_call_irq_on_stack+0x12/0x20
[ 9877.758441]  </IRQ>
[ 9877.759624]  do_softirq_own_stack+0x3d/0x50
[ 9877.760818]  do_softirq+0x72/0x90
[ 9877.761995]  __local_bh_enable_ip+0x50/0x60
[ 9877.763171]  ip6_finish_output2+0x593/0xaa0
[ 9877.764340]  ? ip6_fraglist_init+0x360/0x360
[ 9877.765507]  ? cgroup_bpf_release+0x400/0x400
[ 9877.766677]  ? loopback_xmit+0x15e/0x1b0
[ 9877.767840]  __ip6_finish_output+0x20e/0x380
[ 9877.768999]  ip6_finish_output+0x2d/0xe0
[ 9877.770151]  ip6_output+0x136/0x260
[ 9877.771292]  ? ip6_finish_output+0xe0/0xe0
[ 9877.772435]  ? netdev_core_pick_tx+0x120/0x120
[ 9877.773573]  ? cgroup_bpf_release+0x400/0x400
[ 9877.774850]  ip6_xmit+0x5d9/0xad0
[ 9877.775970]  ? ip6_finish_output2+0xaa0/0xaa0
[ 9877.777101]  ? ip6_dst_check+0xdd/0x1f0
[ 9877.778201]  ? __sk_dst_check+0x6d/0xd0
[ 9877.779295]  ? inet6_csk_route_socket+0x29e/0x3e0
[ 9877.780437]  ? inet6_csk_addr2sockaddr+0xd0/0xd0
[ 9877.781587]  ? __ip6_finish_output+0x20e/0x380
[ 9877.782690]  inet6_csk_xmit+0x17e/0x220
[ 9877.783787]  ? inet6_csk_route_socket+0x3e0/0x3e0
[ 9877.784936]  ? __asan_loadN+0xf/0x20
[ 9877.786040]  ? __tcp_select_window+0x147/0x370
[ 9877.787149]  __tcp_transmit_skb+0xb81/0x1730
[ 9877.788278]  ? __tcp_select_window+0x370/0x370
[ 9877.789448]  ? __kasan_check_write+0x14/0x20
[ 9877.790581]  ? _raw_write_unlock_bh+0x30/0x30
[ 9877.791728]  ? tcp_tfw_sk_write_xmit+0x2b/0x1d0
[ 9877.792968]  tcp_write_xmit+0x796/0x2860
[ 9877.794075]  ? iov_iter_advance+0x12b/0x5f0
[ 9877.795230]  ? bpf_skops_hdr_opt_len+0x210/0x210
[ 9877.796321]  __tcp_push_pending_frames+0x5f/0x1a0
[ 9877.797416]  tcp_push+0x209/0x2b0
[ 9877.798506]  tcp_sendmsg_locked+0x13e3/0x15e0
[ 9877.799606]  ? skb_do_copy_data_nocache+0x1b0/0x1b0
[ 9877.800715]  ? __kasan_check_write+0x14/0x20
[ 9877.801829]  ? _raw_spin_lock_bh+0x86/0xe0
[ 9877.802936]  tcp_sendmsg+0x2d/0x50
[ 9877.804031]  inet6_sendmsg+0x64/0xa0
[ 9877.805124]  ? inet6_ioctl+0x1b0/0x1b0
[ 9877.806255]  sock_sendmsg+0x80/0xa0
[ 9877.807346]  sock_write_iter+0x14c/0x200
[ 9877.808433]  ? sock_sendmsg+0xa0/0xa0
[ 9877.809520]  ? sk_get_meminfo+0x1a0/0x1a0
[ 9877.810623]  ? ptep_set_access_flags+0xd0/0xe0
[ 9877.811713]  ? pgd_free+0x130/0x130
[ 9877.812796]  new_sync_write+0x3aa/0x3c0
[ 9877.813878]  ? new_sync_read+0x3c0/0x3c0
[ 9877.814960]  ? __cgroup_bpf_run_filter_getsockopt+0x172/0x470
[ 9877.816063]  ? __seccomp_filter+0x12e/0xad0
[ 9877.817167]  ? rw_verify_area+0x7c/0x150
[ 9877.818266]  vfs_write+0x311/0x3d0
[ 9877.819361]  ksys_write+0x138/0x170
[ 9877.820452]  ? __ia32_sys_read+0x50/0x50
[ 9877.821545]  ? __secure_computing+0x74/0x150
[ 9877.822640]  __x64_sys_write+0x42/0x50
[ 9877.823732]  do_syscall_64+0x38/0x90
[ 9877.824818]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 9877.825944] RIP: 0033:0x7f3a02fc3887
[ 9877.827050] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77
51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 9877.829502] RSP: 002b:00007ffc0d84f728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 9877.830777] RAX: ffffffffffffffda RBX: 00007f39ffee30a0 RCX: 00007f3a02fc3887
[ 9877.832068] RDX: 0000000000000064 RSI: 00007f39ffee3290 RDI: 000000000000000d
[ 9877.833367] RBP: 00007ffc0d84f760 R08: 0000000000000000 R09: 00007f3a02e139d8
[ 9877.834675] R10: 00007ffc0d84f714 R11: 0000000000000246 R12: 00007f39ffee3290
[ 9877.835979] R13: 00007f3a02d64048 R14: 000055fe2b1070f0 R15: 0000000000000000
[ 9877.837315] Modules linked in: tempesta_fw(OE) tempesta_db(OE) tempesta_tls(OE) tempesta_lib(OE) tcp_diag inet_diag veth tls sha256_ssse3 sha512_ssse3 btrfs blake2b_generic zstd_compress xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_compat br_netfilter nft_masq nft_counter nft_chain_nat nf_nat nf_c
onntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc nf_tables nfnetlink overlay intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
ppdev parport_pc parport i2c_piix4 e1000 qemu_fw_cfg floppy binfmt_misc dm_multipath sch_fq_codel scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore msr ip_tables x_tables autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear bochs_drm drm_vram_helper drm_ttm_he
lper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec aesni_intel glue_helper crypto_simd psmouse input_leds cryptd drm serio_raw
[ 9877.837435]  pata_acpi
[ 9877.838240] [tdb] Close table 'cache0.tdb'
[ 9877.848914]  mac_hid [last unloaded: tempesta_lib]
[ 9877.849020] CR2: ffffed2da0420f9e
kingluo commented 1 month ago

@krizhanovsky The root cause is located.

The message sending may happen in a softirq on another CPU, which may happen before the next statement referencing msg->ss_flags. This means it may access a freed message, triggering a KASAN report.

https://github.com/tempesta-tech/tempesta/blob/910f3a10b11b79d285f7640757733e1e6c1d4dab/fw/tls.c#L811C1-L827C3

How to reproduce

Conditions are set up to reproduce the above scenario, ensuring that the send occurs on the next CPU and delaying the current CPU by calculating the MD5 hash to ensure the message is destroyed when referenced again.

reproduce patch

diff --git a/fw/http.c b/fw/http.c
index 38750670..d6ba2995 100644
--- a/fw/http.c
+++ b/fw/http.c
@@ -997,6 +997,7 @@ tfw_http_resp_pair_free_and_put_conn(void *opaque_data)
    BUG_ON(!req || !req->conn);
    tfw_connection_put(req->conn);
    tfw_http_resp_pair_free(req);
+   printk("--> msg_free\n");
 }

 /*
diff --git a/fw/sock.c b/fw/sock.c
index eaff7080..917f06f3 100644
--- a/fw/sock.c
+++ b/fw/sock.c
@@ -570,7 +570,7 @@ ss_send(struct sock *sk, struct sk_buff **skb_head, int flags)
     * leakage, so we never use synchronous sending.
     */
    sock_hold(sk);
-   if (ss_wq_push(&sw, cpu)) {
+   if (ss_wq_push(&sw, (cpu+1)%4)) {
        T_DBG2("Cannot schedule socket %p for transmission"
               " (queue size %d)\n", sk,
               tfw_wq_size(&per_cpu(si_wq, cpu)));
diff --git a/fw/tls.c b/fw/tls.c
index 51d9ebcb..afb7e047 100644
--- a/fw/tls.c
+++ b/fw/tls.c
@@ -787,6 +787,25 @@ tfw_tls_conn_drop(TfwConn *c)
    tfw_conn_hook_call(TFW_FSM_HTTP, c, conn_drop);
 }

+bool
+md5_hash(char *result, char* data, size_t len)
+{
+   struct shash_desc *desc;
+   struct crypto_shash *shash = crypto_alloc_shash("md5", 0, 0);
+   int size = sizeof(struct shash_desc) + crypto_shash_descsize(shash);
+   desc = (struct shash_desc*) kmalloc(size, GFP_KERNEL);
+
+   if(desc->tfm == NULL)
+       return false;
+
+   crypto_shash_init(desc);
+   crypto_shash_update(desc, data, len);
+   crypto_shash_final(desc, result);
+   crypto_free_shash(desc->tfm);
+
+   return true;
+}
+
 /**
  * Send the @msg skbs as is - tcp_write_xmit() will care about encryption,
  * but attach TLS alert message at the end of the skb list to notify the peer
@@ -798,6 +817,10 @@ tfw_tls_conn_send(TfwConn *c, TfwMsg *msg)
    int r;
    TlsCtx *tls = tfw_tls_context(c);

+   int i;
+   char data[100];
+   char result[100];
+
    T_DBG("TLS %lu bytes (%u bytes)"
          " are to be sent on conn=%pK/sk_write_xmit=%pK ready=%d\n",
          msg->len, tls->io_out.msglen + TLS_HEADER_SIZE, c,
@@ -813,6 +836,12 @@ tfw_tls_conn_send(TfwConn *c, TfwMsg *msg)
    if (r)
        return r;

+   for (i = 0; i < 10000; i++) {
+       md5_hash(result, data, 100);
+   }
+
+   printk("---> tls_con_send ret\n");
+
    /*
     * We can not send the alert on conn_drop hook, because the hook
     * is called on already closed socket.

demo backend

// trailers.go
package main

import (
    "fmt"
    "io"
    "log"
    "net/http"
    "strings"
)

func main() {
    mux := http.NewServeMux()
    str := strings.Repeat("a", 32)
    data := strings.Repeat("a", 1024*1024*10)
    mux.HandleFunc("/trailers", func(w http.ResponseWriter, req *http.Request) {
        w.Header().Set("Trailer", "foobar")
        w.Header().Add("Trailer", "X-Token1")
        w.Header().Set("Content-Type", "text/html")
        w.Header().Set("Server", "Deproxy Server")
        w.Header().Set("Foobbxxxx", "Foobbxxxx")
        w.WriteHeader(http.StatusOK)
        w.Header()["Trailer:foobar"] = []string{"foobar"}
        w.Header().Set("X-Token1", "value")
    })
    log.Fatal(http.ListenAndServe(":8000", mux))
}

tempesta.conf

listen 443 proto=https,h2;
server 127.0.0.1:8000;
tls_certificate /home/kingluo/tempesta/server.crt;
tls_certificate_key /home/kingluo/tempesta/server.key;
tls_match_any_server_name;
cache 2;
cache_fulfill * *;
#server_forward_timeout 5;

reproduce procedure

cd tempesta
sudo -E TFW_CFG_PATH=$HOME/tempesta.conf ./scripts/tempesta.sh --restart
go run trailer.go &
curl https://127.0.0.1/trailers --http2 -k --raw -s -v

dmesg

[  389.799144] ---> tls_con_send ret
[  389.961343] ---> tls_con_send ret
[  390.008512] --> msg_free
[  390.205745] ---> tls_con_send ret
[  390.206767] ==================================================================
[  390.208025] BUG: KASAN: use-after-free in _raw_spin_lock+0x6d/0xd0
[  390.209337] Write of size 4 at addr ffff888106f758a8 by task swapper/0/0

[  390.212064] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G        W  OE     5.10.35+ #1
[  390.213460] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  390.214888] Call Trace:
[  390.216428]  <IRQ>
[  390.217842]  dump_stack+0x96/0xc4
[  390.219261]  print_address_description.constprop.0+0x21/0x220
[  390.220702]  ? _raw_spin_lock_irqsave+0x8e/0xf0
[  390.222136]  ? _raw_write_unlock_bh+0x30/0x30
[  390.223592]  ? tfw_http_sess_learn+0x12a/0x530 [tempesta_fw]
[  390.225033]  ? _raw_spin_lock+0x6d/0xd0
[  390.226473]  kasan_report.cold+0x20/0x37
[  390.227944]  ? _raw_spin_lock+0x6d/0xd0
[  390.229374]  check_memory_region+0x14d/0x1a0
[  390.230848]  __kasan_check_write+0x14/0x20
[  390.232305]  _raw_spin_lock+0x6d/0xd0
[  390.233720]  ? _raw_write_lock_irqsave+0xe0/0xe0
[  390.235173]  ? tfw_connection_send+0x59/0x60 [tempesta_fw]
[  390.236636]  tfw_cli_conn_send+0x2a/0x90 [tempesta_fw]
[  390.238095]  tfw_h2_resp_fwd+0x80/0x180 [tempesta_fw]
[  390.239585]  tfw_http_resp_cache_cb+0x8d/0x4f0 [tempesta_fw]
[  390.241066]  tfw_cache_add+0x185/0x380 [tempesta_fw]
[  390.242549]  ? tfw_http_req_cache_cb+0xbc0/0xbc0 [tempesta_fw]
[  390.244173]  ? __cache_add_node+0x1a0/0x1a0 [tempesta_fw]
[  390.245674]  ? _raw_spin_lock+0x7b/0xd0
[  390.247578]  ? frang_http_req_incomplete_hdrs_check.isra.0+0x290/0x290 [tempesta_fw]
[  390.249148]  ? tfw_http_req_zap_error+0x22/0x1c0 [tempesta_fw]
[  390.250727]  tfw_cache_process+0x491/0x620 [tempesta_fw]
[  390.252320]  ? tfw_http_req_cache_cb+0xbc0/0xbc0 [tempesta_fw]
[  390.253911]  ? tfw_wq_tasklet+0x200/0x200 [tempesta_fw]
[  390.255514]  ? tfw_gfsm_move+0xb7/0x300 [tempesta_fw]
[  390.257106]  tfw_http_resp_cache+0x2ff/0x400 [tempesta_fw]
[  390.258703]  ? tfw_http_resp_gfsm+0xf0/0xf0 [tempesta_fw]
[  390.260368]  ? tfw_gfsm_move+0xb7/0x300 [tempesta_fw]
[  390.261963]  tfw_http_msg_process_generic+0x9d9/0xbd0 [tempesta_fw]
[  390.263500]  ? tfw_http_req_process+0x1110/0x1110 [tempesta_fw]
[  390.264980]  ? __alloc_skb+0x272/0x330
[  390.266447]  ? __kasan_check_write+0x14/0x20
[  390.267932]  ? __coalesce_frag+0x136/0x370 [tempesta_fw]
[  390.269430]  tfw_http_msg_process+0x81/0xc0 [tempesta_fw]
[  390.270940]  tfw_connection_recv+0x188/0x240 [tempesta_fw]
[  390.272461]  ? tfw_connection_send+0x60/0x60 [tempesta_fw]
[  390.274020]  ss_tcp_process_data+0x45f/0x840 [tempesta_fw]
[  390.275519]  ? ss_do_close+0x810/0x810 [tempesta_fw]
[  390.277057]  ? __kasan_check_read+0x11/0x20
[  390.278764]  ? tcp_grow_window+0xce/0x330
[  390.280317]  ss_tcp_data_ready+0xa9/0x1b0 [tempesta_fw]
[  390.281817]  tcp_data_ready+0x73/0x210
[  390.283311]  tcp_data_queue+0x1085/0x1d80
[  390.284766]  ? tcp_validate_incoming+0x1fa/0xa30
[  390.286215]  ? tcp_data_ready+0x210/0x210
[  390.287632]  ? tcp_try_keep_open+0x130/0x130
[  390.288994]  ? __ip_finish_output+0x340/0x340
[  390.290311]  ? tempesta_sock_tcp_rcv+0x6a/0x80
[  390.291646]  ? recalibrate_cpu_khz+0x10/0x10
[  390.293015]  ? ktime_get+0x4f/0xb0
[  390.294363]  tcp_rcv_established+0x427/0xf50
[  390.295697]  ? tcp_parse_md5sig_option+0x20/0xb0
[  390.297014]  ? tcp_data_queue+0x1d80/0x1d80
[  390.298320]  ? __kasan_check_read+0x11/0x20
[  390.299633]  tcp_v4_do_rcv+0x25f/0x360
[  390.300928]  tcp_v4_rcv+0x1600/0x1770
[  390.302219]  ? tcp_v4_early_demux+0x2f0/0x2f0
[  390.303555]  ip_protocol_deliver_rcu+0x46/0x2e0
[  390.304864]  ip_local_deliver_finish+0xc6/0xe0
[  390.306168]  ip_local_deliver+0x1f5/0x210
[  390.307454]  ? ip_local_deliver_finish+0xe0/0xe0
[  390.308904]  ? nf_nat_inet_fn+0x2b3/0x440 [nf_nat]
[  390.310321]  ? ip_protocol_deliver_rcu+0x2e0/0x2e0
[  390.311681]  ? ip_rcv_finish_core.constprop.0+0x17c/0x890
[  390.312965]  ip_rcv_finish+0xcf/0xf0
[  390.314238]  ip_rcv+0x16d/0x180
[  390.315534]  ? ip_local_deliver+0x210/0x210
[  390.316889]  ? ip_rcv_finish_core.constprop.0+0x890/0x890
[  390.318146]  ? ttwu_do_activate+0xa4/0x170
[  390.319399]  ? ip_local_deliver+0x210/0x210
[  390.320605]  __netif_receive_skb_one_core+0x132/0x140
[  390.321811]  ? napi_gro_receive+0x260/0x260
[  390.323008]  ? __kasan_check_write+0x14/0x20
[  390.324254]  ? _raw_spin_lock+0x7b/0xd0
[  390.325473]  __netif_receive_skb+0x26/0xb0
[  390.327116]  process_backlog+0xfe/0x290
[  390.328300]  net_rx_action+0x287/0x6b0
[  390.329475]  ? napi_complete_done+0x2c0/0x2c0
[  390.330623]  ? switch_fpu_return+0x120/0x120
[  390.331770]  __do_softirq+0xfe/0x383
[  390.332889]  asm_call_irq_on_stack+0x12/0x20
[  390.334051]  </IRQ>
[  390.335163]  do_softirq_own_stack+0x3d/0x50
[  390.336296]  irq_exit_rcu+0xc7/0x120
[  390.337410]  sysvec_call_function_single+0x3a/0x90
[  390.338525]  asm_sysvec_call_function_single+0x12/0x20
[  390.339789] RIP: 0010:native_safe_halt+0xe/0x10
[  390.340919] Code: 49 8b 04 24 a8 08 0f 85 71 ff ff ff e9 0f ff ff ff cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 74 24 52 00 fb f4 <c3> 90 e9 07 00 00 00
0f 00 2d 64 24 52 00 f4 c3 cc cc 0f 1f 44 00
[  390.343490] RSP: 0018:ffffffffb4e07dc8 EFLAGS: 00000202
[  390.344741] RAX: ffffffffb3f36500 RBX: ffffffffb4e1b100 RCX: ffffffffb3f20d75
[  390.346033] RDX: 00000000008841a2 RSI: 0000000000000004 RDI: ffff888108435040
[  390.347351] RBP: ffffffffb4e07dd0 R08: 0000000000000001 R09: ffff888108435043
[  390.348650] R10: ffffed1021086a08 R11: 0000000000000001 R12: 0000000000000000
[  390.349937] R13: ffffffffb56e3020 R14: 0000000000000000 R15: 0000000000000000
[  390.351228]  ? __cpuidle_text_start+0x8/0x8
[  390.352518]  ? rcu_eqs_enter.constprop.0+0x85/0xa0
[  390.353809]  ? default_idle+0xe/0x20
[  390.355194]  arch_cpu_idle+0x15/0x20
[  390.356494]  default_idle_call+0x66/0x160
[  390.357772]  do_idle+0x36f/0x430
[  390.359052]  ? asm_sysvec_call_function_single+0x12/0x20
[  390.360635]  ? arch_cpu_idle_exit+0x40/0x40
[  390.362262]  cpu_startup_entry+0x20/0x30
[  390.363638]  rest_init+0xc8/0xce
[  390.364938]  arch_call_rest_init+0xe/0x1b
[  390.366342]  start_kernel+0x380/0x39d
[  390.367675]  x86_64_start_reservations+0x24/0x26
[  390.368985]  x86_64_start_kernel+0x7a/0x7e
[  390.370282]  secondary_startup_64_no_verify+0xb0/0xbb

[  390.372969] Allocated by task 1583:
[  390.374269]  kasan_save_stack+0x23/0x50
[  390.375568]  __kasan_kmalloc.constprop.0+0xcf/0xe0
[  390.376905]  kasan_slab_alloc+0xe/0x10
[  390.378197]  kmem_cache_alloc+0xef/0x230
[  390.379520]  tfw_cli_conn_alloc+0x45/0x140 [tempesta_fw]
[  390.380831]  tfw_sock_clnt_new+0x1a9/0x450 [tempesta_fw]
[  390.382131]  ss_tcp_state_change+0x179/0x310 [tempesta_fw]
[  390.383427]  tcp_rcv_state_process+0x17b7/0x1f50
[  390.384721]  tcp_child_process+0x15c/0x2e0
[  390.386018]  tcp_v4_rcv+0x1179/0x1770
[  390.387344]  ip_protocol_deliver_rcu+0x46/0x2e0
[  390.388632]  ip_local_deliver_finish+0xc6/0xe0
[  390.389926]  ip_local_deliver+0x1f5/0x210
[  390.391212]  ip_rcv_finish+0xcf/0xf0
[  390.392484]  ip_rcv+0x16d/0x180
[  390.393831]  __netif_receive_skb_one_core+0x132/0x140
[  390.395190]  __netif_receive_skb+0x26/0xb0
[  390.396475]  process_backlog+0xfe/0x290
[  390.397742]  net_rx_action+0x287/0x6b0
[  390.399020]  __do_softirq+0xfe/0x383

[  390.401525] Freed by task 20:
[  390.402810]  kasan_save_stack+0x23/0x50
[  390.404059]  kasan_set_track+0x20/0x30
[  390.405297]  kasan_set_free_info+0x1f/0x40
[  390.406472]  __kasan_slab_free+0x116/0x160
[  390.407578]  kasan_slab_free+0xe/0x10
[  390.408665]  slab_free_freelist_hook+0x6d/0x120
[  390.409745]  kmem_cache_free+0x105/0x410
[  390.410869]  tfw_cli_conn_free+0xe1/0x150 [tempesta_fw]
[  390.411967]  tfw_cli_conn_release+0x89/0xc0 [tempesta_fw]
[  390.413062]  tfw_tls_conn_dtor+0x255/0x310 [tempesta_fw]
[  390.414166]  tfw_sock_clnt_drop+0x211/0x220 [tempesta_fw]
[  390.415266]  ss_conn_drop_guard_exit+0x65/0xc0 [tempesta_fw]
[  390.416359]  tcp_done+0x14d/0x1e0
[  390.417433]  tcp_rcv_state_process+0x16c3/0x1f50
[  390.418554]  tcp_v4_do_rcv+0x164/0x360
[  390.419648]  tcp_v4_rcv+0x1600/0x1770
[  390.420733]  ip_protocol_deliver_rcu+0x46/0x2e0
[  390.421817]  ip_local_deliver_finish+0xc6/0xe0
[  390.422921]  ip_local_deliver+0x1f5/0x210
[  390.423990]  ip_rcv_finish+0xcf/0xf0
[  390.425053]  ip_rcv+0x16d/0x180
[  390.426112]  __netif_receive_skb_one_core+0x132/0x140
[  390.427220]  __netif_receive_skb+0x26/0xb0
[  390.428323]  process_backlog+0xfe/0x290
[  390.429400]  net_rx_action+0x287/0x6b0
[  390.430464]  __do_softirq+0xfe/0x383

[  390.432576] The buggy address belongs to the object at ffff888106f75440
                which belongs to the cache tfw_h2_conn_cache of size 2568
[  390.434882] The buggy address is located 1128 bytes inside of
                2568-byte region [ffff888106f75440, ffff888106f75e48)
[  390.437144] The buggy address belongs to the page:
[  390.438272] page:0000000019f7ee9c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x106f70
[  390.439477] head:0000000019f7ee9c order:3 compound_mapcount:0 compound_pincount:0
[  390.440693] flags: 0x17ffffc0010200(slab|head)
[  390.441888] raw: 0017ffffc0010200 dead000000000100 dead000000000122 ffff888101e72f00
[  390.443145] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
[  390.444437] page dumped because: kasan: bad access detected

[  390.446958] Memory state around the buggy address:
[  390.448225]  ffff888106f75780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  390.449578]  ffff888106f75800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  390.450873] >ffff888106f75880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  390.452152]                                   ^
[  390.453469]  ffff888106f75900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  390.454785]  ffff888106f75980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  390.456096] ==================================================================
[  390.457415] Disabling lock debugging due to kernel taint

solution (verified)

Save msg->ss_flags for later access, in case of msg deallocation.

diff --git a/fw/tls.c b/fw/tls.c
index 51d9ebcb..8d1d484c 100644
--- a/fw/tls.c
+++ b/fw/tls.c
@@ -797,6 +797,7 @@ tfw_tls_conn_send(TfwConn *c, TfwMsg *msg)
 {
        int r;
        TlsCtx *tls = tfw_tls_context(c);
+       int ss_flags = msg->ss_flags;

        T_DBG("TLS %lu bytes (%u bytes)"
              " are to be sent on conn=%pK/sk_write_xmit=%pK ready=%d\n",
@@ -817,8 +818,8 @@ tfw_tls_conn_send(TfwConn *c, TfwMsg *msg)
         * We can not send the alert on conn_drop hook, because the hook
         * is called on already closed socket.
         */
-       if (msg->ss_flags & SS_F_CONN_CLOSE) {
-               int close_type = msg->ss_flags & __SS_F_FORCE ?
+       if (ss_flags & SS_F_CONN_CLOSE) {
+               int close_type = ss_flags & __SS_F_FORCE ?
                        TTLS_F_ST_CLOSE : TTLS_F_ST_SHUTDOWN;

                spin_lock(&tls->lock);
kingluo commented 1 month ago

@krizhanovsky I'm afraid the story isn't over yet. Since we've assumed that if ss_send returns without error, the message is valid to access later (either from tfw_tls_conn_send or its caller, e.g. tfw_cli_conn_send), then we have another problem: cli_conn may also be invalid if:

  1. msg is sent successfully and gets released like in this issue, which will reduce the cli_conn counter.
  2. msg is the last request for this cli_conn
  3. The connection is closed by keepalive_timer_cb or TCP FIN/RST from the peer after 1.
  4. The above three situations happen before cli_conn is accessed again.

If this relevant bug is true, then we need to do a tfw_connection_ge before the msg goes into ss_send, e.g.

static int
tfw_tls_conn_send(TfwConn *c, TfwMsg *msg)
{
    int r;
    TlsCtx *tls = tfw_tls_context(c);

    T_DBG("TLS %lu bytes (%u bytes)"
          " are to be sent on conn=%pK/sk_write_xmit=%pK ready=%d\n",
          msg->len, tls->io_out.msglen + TLS_HEADER_SIZE, c,
          c->sk->sk_write_xmit, ttls_xfrm_ready(tls));

    if (ttls_xfrm_ready(tls)) {
        msg->ss_flags |= SS_SKB_TYPE2F(TTLS_MSG_APPLICATION_DATA) |
            SS_F_ENCRYPT;
    }
        // <------------------- must guard `cli_conn` here------------------------------
    r = ss_send(c->sk, &msg->skb_head,
            msg->ss_flags & ~SS_F_CLOSE_FORCE);
    if (r)
        return r;

    /*
     * We can not send the alert on conn_drop hook, because the hook
     * is called on already closed socket.
     */
    if (msg->ss_flags & SS_F_CONN_CLOSE) {
        int close_type = msg->ss_flags & __SS_F_FORCE ?
            TTLS_F_ST_CLOSE : TTLS_F_ST_SHUTDOWN;

        spin_lock(&tls->lock);  // <------------- `cli_conn` access
        r = ttls_close_notify(tls, close_type);
        spin_unlock(&tls->lock);
    }

    return r;
}

as well as the callers, which needs to access cli_conn after tfw_tls_conn_send, e.g. tfw_cli_conn_send:

int
tfw_cli_conn_send(TfwCliConn *cli_conn, TfwMsg *msg)
{
    int r;

        // <------------------- must guard `cli_conn` here------------------------------
    r = tfw_connection_send((TfwConn *)cli_conn, msg);
    /*
     * The lock is needed because the timer deletion was moved from release() to
     * drop(). While release() is called when there are no other users, there is
     * no such luxury with drop() and the connection can still be used due to
     * lingering threads.
     */
    spin_lock(&cli_conn->timer_lock); // <------------- `cli_conn` access
    if (timer_pending(&cli_conn->timer))
        mod_timer(&cli_conn->timer,
              jiffies +
              msecs_to_jiffies((long)tfw_cli_cfg_ka_timeout * 1000));
    spin_unlock(&cli_conn->timer_lock);

Functions in question:

     tfw_h2_resp_fwd (m) ----> tfw_cli_conn_send (c) -> tfw_tls_conn_send (m/c)
__tfw_http_resp_fw (m) ---/

As discussed, these derivative issues will be fixed at https://github.com/tempesta-tech/tempesta/pull/2182.

EvgeniiMekhanik commented 1 month ago

I look on this functions tfw_cli_conn_send and think we have no problem with function:

May be we have problem with tfw_tls_conn_send because this function is called from tfw_connection_send in many places. What will be if we receive FIN when we call this function. It seems that we rely on the fact that we process FIN on the same cpu under the socket lock as we call tfw_connection_send (@krizhanovsky please look it also carefully). The only place when we don't do it is tfw_h2_error_resp when we call it when we fail to process RESPONSE, because we process response from backend on ohter cpu under the lock of server socket!

krizhanovsky commented 1 month ago

@EvgeniiMekhanik @kingluo There are only 4 places where tfw_connection_send() is called:

$ grep 'tfw_connection_send(' *.[ch]
connection.c:tfw_connection_send(TfwConn *conn, TfwMsg *msg)
connection.h:int tfw_connection_send(TfwConn *conn, TfwMsg *msg);
http.c: if (!(r = tfw_connection_send((TfwConn *)srv_conn, (TfwMsg *)req)))
http.c: * requests in @fwd_queue. Typically tfw_connection_send() or its pure
http_frame.c:   if ((r = tfw_connection_send((TfwConn *)conn, &msg)))
sock_clnt.c:    r = tfw_connection_send((TfwConn *)cli_conn, msg);
websocket.c:    if ((r = tfw_connection_send(conn->pair, &msg))) {

and it seems we really have an issue, e.g. in __tfw_http_resp_fwd(), which calls tfw_http_resp_pair_free() after tfw_cli_conn_send() - in my understanding there could be double freeing.

I'd propose to fix this with either:

  1. introducing a reference counter to TfwMsg and once ss_send() is called, i.e. the message is passed to another context, the refcounter should be increased
  2. review the code paths and make sure that the message is deleted only by the callback from ss_send().

It seems the issue was wrongly closed with the PR merge, so I reopen it.

kingluo commented 1 month ago

@krizhanovsky

and it seems we really have an issue, e.g. in __tfw_http_resp_fwd(), which calls tfw_http_resp_pair_free() after tfw_cli_conn_send() - in my understanding there could be double freeing.

No, it will release msg only when tfw_cli_conn_send fails, which is fine. As I said, the only bug of msg usage is referencing msg after ss_send succeeds (and its caller's behavior in handling msg).

EvgeniiMekhanik commented 1 month ago

For http1 we don't destroy message in other context. We do it only for http2

kingluo commented 1 month ago

Let me clarfiy my opinions again, because I found it seems a messy about our discussions.

We have two sort of bugs:

  1. m: msg use-after-free
  2. c: if msg gets freed, it will decrease its associated connection too, so chances are that the connection already gets freed before we use it again, please see the conditions above: https://github.com/tempesta-tech/tempesta/issues/2169#issuecomment-2249686288

All in all, we have a graph here:

     tfw_h2_resp_fwd (m) ----> tfw_cli_conn_send (c?) -> tfw_tls_conn_send (m/c)
__tfw_http_resp_fw (m) ---/

So I suggest all msg related bugs are fixed here, while cli_conn related bugs are fixed in https://github.com/tempesta-tech/tempesta/pull/2182.

c?: https://github.com/tempesta-tech/tempesta/pull/2182#issuecomment-2255705530


@krizhanovsky yes, as the graph shows, there still are two suspicious bugs to fix, but since they're side issues (or in other words, similar issues) that are beyond this issue, that's why I closed this issue by the PR. I'll make sure to fix them later if you think it's necessary.

void
tfw_h2_resp_fwd(TfwHttpResp *resp)
{
    bool resp_in_xmit =
        (TFW_SKB_CB(resp->msg.skb_head)->opaque_data == resp);
    TfwHttpReq *req = resp->req;
    TfwConn *conn = req->conn;

    tfw_connection_get(conn);
    do_access_log(resp);

    if (tfw_cli_conn_send((TfwCliConn *)conn, (TfwMsg *)resp)) {
        T_DBG("%s: cannot send data to client via HTTP/2\n", __func__);
        TFW_INC_STAT_BH(serv.msgs_otherr);
        tfw_connection_close(conn, true);
        /* We can't send response, so we should free it here. */
        resp_in_xmit = false;
    } else {
                // <----------------------- refer to `resp` after succeed
        TFW_INC_STAT_BH(serv.msgs_forwarded);
        tfw_inc_global_hm_stats(resp->status);
    }
static void
__tfw_http_resp_fwd(TfwCliConn *cli_conn, struct list_head *ret_queue)
{
    TfwHttpReq *req, *tmp;

    list_for_each_entry_safe(req, tmp, ret_queue, msg.seq_list) {
        BUG_ON(!req->resp);
        tfw_http_resp_init_ss_flags(req->resp);
        if (tfw_cli_conn_send(cli_conn, (TfwMsg *)req->resp)) {
            tfw_connection_close((TfwConn *)cli_conn, true);
            return;
        }
                // <----------------------- refer to `req` and `resp` after succeed
        TFW_INC_STAT_BH(serv.msgs_forwarded);
        tfw_inc_global_hm_stats(req->resp->status);
        list_del_init(&req->msg.seq_list);
        tfw_http_resp_pair_free(req);
    }
}
krizhanovsky commented 1 month ago

@kingluo OK, thank you for the explanation. Did I understand correctly that you propose to close the issue since the msg bug is fixed in #2180 and leave connections fixes for #2182. Also since you approved #2182, the connections issue is also gone in the PR? If everything correct, then feel free to close the issue.

kingluo commented 1 month ago

@kingluo OK, thank you for the explanation. Did I understand correctly that you propose to close the issue since the msg bug is fixed in #2180 and leave connections fixes for #2182. Also since you approved #2182, the connections issue is also gone in the PR? If everything correct, then feel free to close the issue.

yes, no cli_conn issues will be fixed here, but the above side issues of msg can be fixed here by the way, if necessary.

krizhanovsky commented 1 month ago

Basically if we have some problem, then we must have an appropriate issue for this. #2180 is merged and we can leave the issue for the rest of the side bugs.