tempesta-tech / tempesta

All-in-one solution for high performance web content delivery and advanced protection against DDoS and web attacks
https://tempesta-tech.com/
GNU General Public License v2.0
621 stars 103 forks source link

Invalid server connection reference counting #2183

Closed EvgeniiMekhanik closed 2 months ago

EvgeniiMekhanik commented 3 months ago

During my work on https://github.com/tempesta-tech/tempesta/issues/2178 I found a new BUG.

kernel BUG at /home/evgeny/workdir/tempesta/fw/connection.h:487 [ 832.096108] invalid opcode: 0000 [#1] SMP NOPTI [ 832.096109] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G OE 5.10.35+ #298 [ 832.096110] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 832.096124] RIP: 0010:tfw_sock_srv_connect_drop+0x1a3/0x1e0 [tempesta_fw]

Step to reproduce: run test from https://github.com/tempesta-tech/tempesta-test/commit/1f7290ebab8fcfc7701583bad7c90706bb9345ee http_resp_code_block_2 a lot of time. May be it is necessary to add a lot of debug message to catch this bug.

EvgeniiMekhanik commented 3 months ago

https://github.com/tempesta-tech/tempesta/tree/MekhanikEvgenii/2183-dbg this is a branch to reproduce crash with debug info. It seems that there are a lot of bugs which can be reproduces by t_frang.test_config.TestOverridingInheritance.test_default_not_override_http_resp_code_block_2 test.

[ 1646.358134] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 1646.359559] tfw_sock_srv_connect_drop: conn ffffa067fb2f9260 refcnt before put -1073741824 | 1 | is_client 0
[ 1646.360311] #PF: supervisor read access in kernel mode
[ 1646.423105] #PF: error_code(0x0000) - not-present page
[ 1646.423524] PGD 0 P4D 0 
[ 1646.423738] Oops: 0000 [#1] SMP NOPTI
[ 1646.424040] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE     5.10.35+ #298
[ 1646.424682] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 1646.425473] RIP: 0010:tfw_http_cli_error_resp_and_log.cold+0x1f5/0x288 [tempesta_fw]
[ 1646.426102] Code: 00 00 00 44 89 e2 0f b6 4d 87 8b 75 8c 4c 89 ff 83 f2 01 0f b6 d2 e8 c6 5e f7 ff 4d 8b a6 a0 00 00 00 65 44 8b 05 a7 e6 e9 3e <45> 8b 4c 24 08 41 81 e1 00 01 00 00 4d 85 e4 74 66 41 8b 4c 24 40
[ 1646.427617] RSP: 0018:ffffb5cfc0003950 EFLAGS: 00010246
[ 1646.428044] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1646.428622] RDX: 0000000000000000 RSI: ffffa068b7c18a40 RDI: ffffa068b7c18a40
[ 1646.429201] RBP: ffffb5cfc00039e0 R08: 0000000000000000 R09: 6e6f635f7672735f
[ 1646.429783] R10: 6f72645f7463656e R11: 206e6e6f63203a70 R12: 0000000000000000
[ 1646.430360] R13: 0000000000000193 R14: ffffa067fb2df020 R15: ffffa067fdd61c40
[ 1646.430959] FS:  0000000000000000(0000) GS:ffffa068b7c00000(0000) knlGS:0000000000000000
[ 1646.431615] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1646.432082] CR2: 0000000000000008 CR3: 00000001aa21e000 CR4: 0000000000750ef0
[ 1646.432663] PKRU: 55555554
[ 1646.432890] Call Trace:
[ 1646.433096]  <IRQ>
[ 1646.433275]  tfw_http_resp_cache+0x155/0x1c0 [tempesta_fw]
[ 1646.433729]  tfw_http_msg_process_generic+0x54d/0x6e0 [tempesta_fw]
[ 1646.434247]  ? __coalesce_frag+0x99/0x150 [tempesta_fw]
[ 1646.434686]  tfw_http_msg_process+0x3e/0x60 [tempesta_fw]
[ 1646.435130]  tfw_connection_recv+0xbb/0x140 [tempesta_fw]
[ 1646.435582]  ? ss_skb_unroll+0xce/0x280 [tempesta_fw]
[ 1646.436001]  ss_tcp_process_data+0x20a/0x4b0 [tempesta_fw]
[ 1646.436469]  ss_tcp_data_ready+0x57/0x140 [tempesta_fw]
[ 1646.436900]  tcp_data_ready+0x2b/0xd0
[ 1646.437204]  tcp_data_queue+0x805/0xe50
[ 1646.437521]  tcp_rcv_established+0x254/0x910
[ 1646.437872]  tcp_v4_do_rcv+0x140/0x200
[ 1646.438186]  tcp_v4_rcv+0xcd0/0xe20
[ 1646.438488]  ip_protocol_deliver_rcu+0x44/0x230
[ 1646.438859]  ip_local_deliver_finish+0x48/0x60
[ 1646.439229]  ip_local_deliver+0xf8/0x110
[ 1646.439553]  ? ip_protocol_deliver_rcu+0x230/0x230
[ 1646.439945]  ip_rcv_finish+0x87/0xa0
[ 1646.440240]  ip_rcv+0xce/0xe0
[ 1646.440492]  ? ip_rcv_finish_core.constprop.0+0x470/0x470
[ 1646.440935]  __netif_receive_skb_one_core+0x86/0xa0
[ 1646.441334]  __netif_receive_skb+0x18/0x60
[ 1646.441676]  process_backlog+0x9e/0x170
[ 1646.441992]  net_rx_action+0x13b/0x430
[ 1646.442310]  ? try_to_wake_up+0x218/0x6c0
[ 1646.442662]  __do_softirq+0xe3/0x340
[ 1646.442958]  asm_call_irq_on_stack+0xf/0x20
[ 1646.443301]  </IRQ>
[ 1646.443480]  do_softirq_own_stack+0x3d/0x50
[ 1646.443824]  irq_exit_rcu+0xa2/0xe0
[ 1646.444114]  sysvec_call_function_single+0x3d/0x90
[ 1646.444510]  asm_sysvec_call_function_single+0x12/0x20
[ 1646.444930] RIP: 0010:native_safe_halt+0xe/0x10
[ 1646.445302] Code: 39 ff ff ff 4c 89 ee 48 c7 c7 40 ba a5 9d e8 d9 6a 93 ff e9 01 ff ff ff cc cc cc cc e9 07 00 00 00 0f 00 2d 16 ac 49 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 06 ac 49 00 f4 c3 cc cc 0f 1f 44 00
[ 1646.446811] RSP: 0018:ffffffff9da03df0 EFLAGS: 00000206
[ 1646.447243] RAX: ffffffff9cb70bd0 RBX: 0000000000000000 RCX: ffffa068b7c2cdc0
[ 1646.447820] RDX: 000000000048f026 RSI: ffffffff9da03d90 RDI: 0000017f529f6d18
[ 1646.448397] RBP: ffffffff9da03df8 R08: 0000000000000001 R09: 0000017f4fb27df7
[ 1646.448993] R10: 0000000000000000 R11: 000000000002c040 R12: ffffffff9da1bb80
[ 1646.449570] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000092
[ 1646.450148]  ? __sched_text_end+0x4/0x4
[ 1646.450476]  ? default_idle+0xe/0x20
[ 1646.450772]  arch_cpu_idle+0x15/0x20
[ 1646.451067]  default_idle_call+0x3d/0xc0
[ 1646.451390]  do_idle+0x215/0x2a0
[ 1646.451658]  cpu_startup_entry+0x20/0x30
[ 1646.451981]  rest_init+0xcc/0xee
[ 1646.452250]  ? acpi_enable_subsystem+0x20b/0x213
[ 1646.452634]  arch_call_rest_init+0xe/0x1b
[ 1646.452964]  start_kernel+0x8a5/0x8cb
[ 1646.453266]  x86_64_start_reservations+0x24/0x26
[ 1646.453643]  x86_64_start_kernel+0xf0/0xf7
[ 1646.453981]  secondary_startup_64_no_verify+0xc2/0xcb
[ 1646.454394] Modules linked in: tempesta_fw(OE) tempesta_db(OE) tempesta_tls(OE) tempesta_lib(OE) tls sha256_ssse3 sha512_ssse3 md4 cmac nls_utf8 cifs libarc4 fscache libdes veth nft_masq nvme_tcp nvme_fabrics nvme_core vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay binfmt_misc nls_iso8859_1 virtio_gpu virtio_dma_buf kvm_intel drm_kms_helper kvm cec rc_core fb_sys_fops syscopyarea crct10dif_pclmul ghash_clmulni_intel sysfillrect aesni_intel sysimgblt crypto_simd joydev cryptd glue_helper input_leds mac_hid sch_fq_codel serio_raw qemu_fw_cfg msr parport_pc ppdev lp drm parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c psmouse e1000 crc32_pclmul floppy pata_acpi
[ 1646.454446]  i2c_piix4 [last unloaded: tempesta_lib]
[ 1646.461850] CR2: 0000000000000008
[ 1646.462127] ---[ end trace 63500d0fcf9291fc ]---
[ 1646.462524] RIP: 0010:tfw_http_cli_error_resp_and_log.cold+0x1f5/0x288 [tempesta_fw]
[ 1646.463152] Code: 00 00 00 44 89 e2 0f b6 4d 87 8b 75 8c 4c 89 ff 83 f2 01 0f b6 d2 e8 c6 5e f7 ff 4d 8b a6 a0 00 00 00 65 44 8b 05 a7 e6 e9 3e <45> 8b 4c 24 08 41 81 e1 00 01 00 00 4d 85 e4 74 66 41 8b 4c 24 40
[ 1646.464655] RSP: 0018:ffffb5cfc0003950 EFLAGS: 00010246
[ 1646.465083] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1646.465661] RDX: 0000000000000000 RSI: ffffa068b7c18a40 RDI: ffffa068b7c18a40
[ 1646.466239] RBP: ffffb5cfc00039e0 R08: 0000000000000000 R09: 6e6f635f7672735f
[ 1646.466844] R10: 6f72645f7463656e R11: 206e6e6f63203a70 R12: 0000000000000000
[ 1646.467427] R13: 0000000000000193 R14: ffffa067fb2df020 R15: ffffa067fdd61c40
[ 1646.468006] FS:  0000000000000000(0000) GS:ffffa068b7c00000(0000) knlGS:0000000000000000
[ 1646.468660] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1646.469128] CR2: 0000000000000008 CR3: 00000001aa21e000 CR4: 0000000000750ef0
[ 1646.469707] PKRU: 55555554
EvgeniiMekhanik commented 3 months ago

It seems we have a problem at least at three places.

EvgeniiMekhanik commented 3 months ago

Also I catch

[  922.766862] BUG tfw_h2_conn_cache (Tainted: G    B      OE    ): Objects remaining in tfw_h2_conn_cache on __kmem_cache_shutdown()
[  922.769314] -----------------------------------------------------------------------------
[  922.769314] 
[  922.770810] INFO: Slab 0x00000000224d306d objects=12 used=1 fp=0x0000000022150e17 flags=0x17ffffc0010200
[  922.772057] CPU: 1 PID: 73625 Comm: rmmod Tainted: G    B      OE     5.10.35+ #298
[  922.772902] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  922.773862] Call Trace:
[  922.774111]  dump_stack+0x70/0x8b
[  922.774409]  slab_err+0xb7/0xdc
[  922.774667]  ? slub_cpu_dead+0x120/0x120
[  922.774979]  ? slub_cpu_dead+0x120/0x120
[  922.775308]  ? on_each_cpu_cond_mask+0x48/0x90
[  922.775675]  __kmem_cache_shutdown+0x165/0x330
[  922.776038]  kmem_cache_destroy+0x59/0x110
[  922.776380]  tfw_sock_clnt_exit+0x21/0x40 [tempesta_fw]
[  922.776818]  tfw_exit+0xd7/0xea [tempesta_fw]
[  922.777161]  __do_sys_delete_module.constprop.0+0x181/0x2f0
[  922.777605]  ? exit_to_user_mode_prepare+0x35/0x150
[  922.777985]  __x64_sys_delete_module+0x12/0x20
[  922.778350]  do_syscall_64+0x38/0x90
[  922.778632]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  922.779039] RIP: 0033:0x7fda1e0e8aeb
[  922.779321] Code: 73 01 c3 48 8b 0d 45 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 33 0f 00 f7 d8 64 89 01 48
[  922.780906] RSP: 002b:00007fff88004288 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  922.781555] RAX: ffffffffffffffda RBX: 000055cdf6506770 RCX: 00007fda1e0e8aeb
[  922.782175] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055cdf65067d8
[  922.782773] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  922.783374] R10: 00007fda1e180ac0 R11: 0000000000000206 R12: 00007fff880044e0
[  922.783914] R13: 000055cdf65062a0 R14: 00007fff8800669e R15: 000055cdf6506770
[  922.784475] INFO: Object 0x00000000733c6f5e @offset=17976
[  922.784860] kmem_cache_destroy tfw_h2_conn_cache: Slab cache still has objects
[  922.785652] CPU: 1 PID: 73625 Comm: rmmod Tainted: G    B      OE     5.10.35+ #298
[  922.786327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  922.787036] Call Trace:
[  922.787280]  dump_stack+0x70/0x8b
[  922.787560]  kmem_cache_destroy+0x107/0x110
[  922.787946]  tfw_sock_clnt_exit+0x21/0x40 [tempesta_fw]
[  922.788427]  tfw_exit+0xd7/0xea [tempesta_fw]
[  922.788792]  __do_sys_delete_module.constprop.0+0x181/0x2f0
[  922.789286]  ? exit_to_user_mode_prepare+0x35/0x150
[  922.789696]  __x64_sys_delete_module+0x12/0x20
[  922.790051]  do_syscall_64+0x38/0x90
[  922.790359]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  922.790762] RIP: 0033:0x7fda1e0e8aeb
[  922.791073] Code: 73 01 c3 48 8b 0d 45 33 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 15 33 0f 00 f7 d8 64 89 01 48
[  922.792700] RSP: 002b:00007fff88004288 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  922.793361] RAX: ffffffffffffffda RBX: 000055cdf6506770 RCX: 00007fda1e0e8aeb
[  922.793955] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055cdf65067d8
[  922.794595] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  922.795225] R10: 00007fda1e180ac0 R11: 0000000000000206 R12: 00007fff880044e0
[  922.795874] R13: 000055cdf65062a0 R14: 00007fff8800669e R15: 000055cdf6506770
[  922.850320] [tdb] Shutdown Tempesta DB