s0nx commented 1 year ago

Motivation

Memory allocation failures should not result in a kernel crash.

Scope

As of now, high volume of legitimate traffic might cause OOM, which in turn leads to the kernel panic. For example, a VM with 6 GB RAM becomes unresponsive in a matter of seconds on h2load test with 1k connections and 2k of streams per connection.

[ 3893.943876] Out of memory and no killable processes...                                                                                                                   
[ 3893.943877] Kernel panic - not syncing: System is deadlocked on memory                                                                                                   
[ 3893.943879] CPU: 3 PID: 724 Comm: systemd-userdbd Tainted: G        W  OE     5.10.35perf+ #1                                                                            
[ 3893.943880] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015                                                                                    
[ 3893.943881] Call Trace:                                                                                                                                                  
[ 3893.943883]  dump_stack+0x6b/0x83                                                                                                                                        
[ 3893.943887]  panic+0xf1/0x2d3                                                                                                                                            
[ 3893.943892]  ? printk+0x48/0x4a                                                                                                                                          
[ 3893.943893]  out_of_memory.cold+0x2f/0x7e                                                                                                                                
[ 3893.943895]  __alloc_pages_slowpath.constprop.0+0xba3/0xc70                                                                                                              
[ 3893.943897]  __alloc_pages_nodemask+0x2e3/0x310                                                                                                                          
[ 3893.943899]  alloc_pages_vma+0x80/0x260                                                                                                                                  
[ 3893.943901]  do_swap_page+0x6fc/0x7d0                                                                                                                                    
[ 3893.943903]  handle_mm_fault+0xd98/0x1950                                                                                                                                
[ 3893.943905]  do_user_addr_fault+0x1bb/0x3f0                                                                                                                              
[ 3893.943907]  exc_page_fault+0x67/0x150                                                                                                                                   
[ 3893.943909]  asm_exc_page_fault+0x1e/0x30                                                                                                                                [ 3893.943910] RIP: 0010:__put_user_nocheck_4+0x3/0x11                                                                                                                      
[ 3893.943912] Code: 00 00 48 39 d9 73 54 0f 01 cb 66 89 01 31 c9 0f 01 ca c3 0f 1f 44 00 00 48 bb fd ef ff ff ff 7f 00 00 48 39 d9 73 34 0f 01 cb <89> 01 31 c9 0f 01 ca c3
 66 0f 1f 44 00 00 48 bb f9 ef ff ff ff 7f                                                                                                                                  
[ 3893.943913] RSP: 0000:ffffc900004f7dd0 EFLAGS: 00050202                                                                                                                  
[ 3893.943915] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 000055ce8da86c00                                                                                            
[ 3893.943916] RDX: 0000000000000000 RSI: ffff8881fe76a3e8 RDI: ffff8881225dd280                                                                                            
[ 3893.943917] RBP: ffff8881ff170398 R08: 0000000000000001 R09: ffff88812730d000                                                                                            
[ 3893.943918] R10: ffff88812730d050 R11: 0000000000000000 R12: ffffc900004f7ec8                                                                                            
[ 3893.943919] R13: ffffc900004f7e28 R14: 000055ce8da86c00 R15: ffff8881ff170380                                                                                            
[ 3893.943922]  ep_send_events_proc+0xf1/0x1f0                                                                                                                              
[ 3893.943923]  ? ep_read_events_proc+0xc0/0xc0                                                                                                                             
[ 3893.943924]  ep_scan_ready_list.constprop.0+0x96/0x180                                                                                                                   
[ 3893.943926]  do_epoll_wait+0x241/0x640                                                                                                                                   
[ 3893.943927]  ? auditd_test_task+0x33/0x40                                                                                                                                
[ 3893.943929]  ? add_wait_queue_exclusive+0x70/0x70                                                                                                                        
[ 3893.943930]  __x64_sys_epoll_wait+0x17/0x20                                                                                                                              
[ 3893.943932]  do_syscall_64+0x33/0x80                                                                                                                                     
[ 3893.943933]  entry_SYSCALL_64_after_hwframe+0x44/0xa9                                                                                                                    
[ 3893.943935] RIP: 0033:0x7f986b0cb5fa                                                                                                                                     
[ 3893.943936] Code: Unable to access opcode bytes at RIP 0x7f986b0cb5d0.                                                                                                   
[ 3893.943937] RSP: 002b:00007ffd59d656b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e8                                                                                       
[ 3893.943939] RAX: ffffffffffffffda RBX: 000055ce8da84fd0 RCX: 00007f986b0cb5fa                                                                                            
[ 3893.943940] RDX: 0000000000000008 RSI: 000055ce8da86c00 RDI: 0000000000000005                                                                                            
[ 3893.943941] RBP: 00000000000002d4 R08: 0000000000000008 R09: 00007f986b4abae0                                                                                            
[ 3893.943942] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000028                                                                                            
[ 3893.943944] R13: 0000000000000008 R14: 0000000000000004 R15: 000055ce8da85160                                                                                            
[ 3893.944265] Kernel Offset: disabled                                                                                                                                      
[ 3893.944267] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]---

Testing

Tempesta config:

listen 443 proto=h2;

cache 1;
cache_fulfill * *;

srv_group ngx_local {
        server 127.0.0.1:8000 conns_n=4;
}

vhost f35tfw.local {
        tls_certificate /root/certs/tempesta/RSA/tfw-root.crt;
        tls_certificate_key /root/certs/tempesta/RSA/tfw-root.key;
        proxy_pass ngx_local;
}

http_chain {
        -> f35tfw.local;
}

Cache warmup: h2load https://f35tfw.local -t 1 -c 1 -n 1 + Workload: h2load https://f35tfw.local -t 2 -c 1000 -D 30 -m 2048

s0nx commented 1 year ago

1346

krizhanovsky commented 1 year ago

@s0nx why the issue is crucial? Doesn't Nginx on vanilla (not ours) kernel leads to OOM?

const-t commented 6 months ago

Another aspect of the current issue is that we need to test error handling branches for memory allocation failures. Some of the function are not reentrant. As well need to consider using error injection mechanism provided by the kernel or use our implementation of error injection.

krizhanovsky commented 1 month ago

This task is crucial to provide resistance to DDoS attacks. Probably just a default rate limits and maybe a global rate limit will mitigate the problem.

tempesta-tech / tempesta

Handle OOM events more gracefully #1789

Motivation

Scope

Testing

1346