scylladb / scylla-bench

42 stars 34 forks source link

S-B gets OOMkilled running load with large partitions (memory leak?) #123

Open vponomaryov opened 1 year ago

vponomaryov commented 1 year ago

scylla-bench: scylladb/hydra-loaders:scylla-bench-v0.1.16 (docker) scylla: 2023.1.0~rc4-0.20230402.f90af81a13cc with build-id 4db7799d283c7ca63c7639b04db34c301dbcfb66 (non-docker) Platform: AWS / eu-west-1 DB nodes num: 4 DB node instance type: i3en.3xlarge Loaders num: 3 Loader node instance type: c5n.2xlarge

Loader commands:

- scylla-bench -workload=uniform -mode=read -replication-factor=3 -partition-count=60
  -clustering-row-count=10000000 -retry-number=0 -clustering-row-size=2048 -rows-per-request=2000
  -timeout=180s -concurrency=700 -max-rate=64000  -duration=5760m -connection-count
  500         
- scylla-bench -workload=uniform -mode=read -replication-factor=3 -partition-count=60
  -clustering-row-count=10000000 -retry-number=0 -clustering-row-size=2048 -rows-per-request=2000
  -timeout=180s -concurrency=700 -max-rate=64000  -duration=5760m -connection-count
  500         
- scylla-bench -workload=uniform -mode=read -replication-factor=3 -partition-count=60
  -clustering-row-count=10000000 -retry-number=0 -clustering-row-size=2048 -rows-per-request=2000
  -timeout=180s -concurrency=700 -max-rate=64000  -duration=5760m -connection-count
  500 

So, running above commands one on a loader we got one of them (loader-3) OOMkilled:

Apr 19 18:23:31 kernel: scylla-bench invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
Apr 19 18:23:31 kernel: scylla-bench cpuset=8ea88c75ae805f18fb862b1ec82e301af32c4e910b0eb2b87364ad7e62d831d5 mems_allowed=0
Apr 19 18:23:31 kernel: CPU: 5 PID: 21662 Comm: scylla-bench Not tainted 4.20.4-1.el7.elrepo.x86_64 #1
Apr 19 18:23:31 kernel: Hardware name: Amazon EC2 c5n.2xlarge/, BIOS 1.0 10/16/2017                    
Apr 19 18:23:31 kernel: Call Trace:                                                                    
Apr 19 18:23:31 kernel:  dump_stack+0x63/0x88                                                          
Apr 19 18:23:31 kernel:  dump_header+0x78/0x2a4                                                        
Apr 19 18:23:31 kernel:  ? cpuset_mems_allowed_intersects+0x21/0x30                                    
Apr 19 18:23:31 kernel:  oom_kill_process+0x262/0x290                                                  
Apr 19 18:23:31 kernel:  out_of_memory+0x140/0x4b0                                                     
Apr 19 18:23:31 kernel:  __alloc_pages_slowpath+0x863/0xaf9                                            
Apr 19 18:23:31 kernel:  __alloc_pages_nodemask+0x266/0x2a0                                            
Apr 19 18:23:31 kernel:  alloc_pages_current+0x6a/0xb0                                                 
Apr 19 18:23:31 kernel:  __page_cache_alloc+0x73/0x90                                                  
Apr 19 18:23:31 kernel:  filemap_fault+0x40c/0x7c0                                                     
Apr 19 18:23:31 kernel:  ? unlock_page_memcg+0x12/0x20                                                 
Apr 19 18:23:31 kernel:  ? page_add_file_rmap+0x187/0x220                                              
Apr 19 18:23:31 kernel:  ? _cond_resched+0x19/0x30                                                     
Apr 19 18:23:31 kernel:  ? down_read+0x12/0x40                                                         
Apr 19 18:23:31 kernel:  __xfs_filemap_fault+0x72/0x1c0 [xfs]                                          
Apr 19 18:23:31 kernel:  xfs_filemap_fault+0x2c/0x30 [xfs]                                             
Apr 19 18:23:31 kernel:  __do_fault+0x3e/0xc0                                                          
Apr 19 18:23:31 kernel:  __handle_mm_fault+0xbd6/0xe80                                                 
Apr 19 18:23:31 kernel:  handle_mm_fault+0x102/0x220                                                   
Apr 19 18:23:31 kernel:  __do_page_fault+0x21c/0x4c0                                                   
Apr 19 18:23:31 kernel:  do_page_fault+0x37/0x140                                                      
Apr 19 18:23:31 kernel:  ? async_page_fault+0x8/0x30                                                   
Apr 19 18:23:31 kernel:  do_async_page_fault+0x22/0xd0                                                 
Apr 19 18:23:31 kernel:  ? async_page_fault+0x8/0x30                                                   
Apr 19 18:23:31 kernel:  async_page_fault+0x1e/0x30                                                    
Apr 19 18:23:31 kernel: RIP: 0033:0x464200                                                             
Apr 19 18:23:31 kernel: Code: Bad RIP value.                                                           
Apr 19 18:23:31 kernel: RSP: 002b:000000c000653288 EFLAGS: 00010202                                    
Apr 19 18:23:31 kernel: RAX: 000000c0004991e0 RBX: 000000c000499380 RCX: 000000c000499380              
Apr 19 18:23:31 kernel: RDX: 000000c000653380 RSI: 0000000000000000 RDI: 0000000000000101              
Apr 19 18:23:31 kernel: RBP: 000000c0006532f8 R08: 00000000009bef80 R09: 0000024360fc5245              
Apr 19 18:23:31 kernel: R10: 000000000098e160 R11: 000000000000c744 R12: 000000000043e1a0              
Apr 19 18:23:31 kernel: R13: 0000000000000004 R14: 000000c000499380 R15: 000141ff4f158308              
Apr 19 18:23:31 kernel: Mem-Info:                                                                      
Apr 19 18:23:31 kernel: active_anon:4748116 inactive_anon:371183 isolated_anon:0                       
                         active_file:144 inactive_file:77 isolated_file:0                              
                         unevictable:0 dirty:0 writeback:0 unstable:0                                  
                         slab_reclaimable:11569 slab_unreclaimable:23773                               
                         mapped:341 shmem:209 pagetables:17282 bounce:0                                
                         free:38281 free_pcp:2151 free_cma:0
Apr 19 18:23:31 kernel: Node 0 active_anon:18992464kB inactive_anon:1484732kB active_file:576kB inactive_file:308kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1364kB dirty:0kB writeback:0kB shmem:836kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 5292032kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Apr 19 18:23:31 kernel: Node 0 DMA free:15908kB min:48kB low:60kB high:72kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Apr 19 18:23:31 kernel: lowmem_reserve[]: 0 2915 20441 20441 20441                                     
Apr 19 18:23:31 kernel: Node 0 DMA32 free:79344kB min:9628kB low:12612kB high:15596kB active_anon:2756648kB inactive_anon:187876kB active_file:556kB inactive_file:224kB unevictable:0kB writepending:0kB present:3129252kB managed:3041788kB mlocked:0kB kernel_stack:80kB pagetables:6544kB bounce:0kB free_pcp:3008kB local_pcp:156kB free_cma:0kB
Apr 19 18:23:31 kernel: lowmem_reserve[]: 0 0 17525 17525 17525                                        
Apr 19 18:23:31 kernel: Node 0 Normal free:57872kB min:57900kB low:75844kB high:93788kB active_anon:16235816kB inactive_anon:1296856kB active_file:1072kB inactive_file:0kB unevictable:0kB writepending:0kB present:18284544kB managed:17946372kB mlocked:0kB kernel_stack:4512kB pagetables:62584kB bounce:0kB free_pcp:5696kB local_pcp:168kB free_cma:0kB
Apr 19 18:23:31 kernel: lowmem_reserve[]: 0 0 0 0 0                                                    
Apr 19 18:23:31 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
Apr 19 18:23:31 kernel: Node 0 DMA32: 330*4kB (UE) 635*8kB (UME) 496*16kB (UME) 278*32kB (UME) 79*64kB (UME) 18*128kB (UM) 19*256kB (UE) 51*512kB (UM) 17*1024kB (UME) 0*2048kB 0*4096kB = 78976kB
Apr 19 18:23:31 kernel: Node 0 Normal: 272*4kB (UME) 249*8kB (ME) 259*16kB (UME) 331*32kB (UME) 98*64kB (UME) 39*128kB (UME) 15*256kB (UME) 4*512kB (UM) 22*1024kB (ME) 0*2048kB 0*4096kB = 57496kB
Apr 19 18:23:31 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Apr 19 18:23:31 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 19 18:23:31 kernel: 96385 total pagecache pages                                                 
Apr 19 18:23:31 kernel: 96037 pages in swap cache                                                   
Apr 19 18:23:31 kernel: Swap cache stats: add 8936640, delete 8844227, find 4194314/4735878         
Apr 19 18:23:31 kernel: Free swap  = 0kB                                                            
Apr 19 18:23:31 kernel: Total swap = 10485756kB                                                     
Apr 19 18:23:31 kernel: 5357447 pages RAM                                                           
Apr 19 18:23:31 kernel: 0 pages HighMem/MovableOnly                                                 
Apr 19 18:23:31 kernel: 106430 pages reserved                                                       
Apr 19 18:23:31 kernel: 0 pages hwpoisoned                                                          
Apr 19 18:23:31 kernel: Tasks state (memory values in pages):
Apr 19 18:23:31 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Apr 19 18:23:31 kernel: [    875]     0   875    10059      222   118784       85             0 systemd-journal
Apr 19 18:23:31 kernel: [    903]     0   903    12055        2   122880      459         -1000 systemd-udevd
Apr 19 18:23:31 kernel: [    949]     0   949    13886       10   131072      101         -1000 auditd
Apr 19 18:23:31 kernel: [   1052]    81  1052    16580        5   155648      143          -900 dbus-daemon
Apr 19 18:23:31 kernel: [   1056]     0  1056     4162      666    86016      189             0 collectd_export
Apr 19 18:23:31 kernel: [   1058]     0  1058     5389       24    86016       36             0 irqbalance
Apr 19 18:23:31 kernel: [   1060]     0  1060     6599       14    94208       61             0 systemd-logind
Apr 19 18:23:31 kernel: [   1061]   998  1061   153065        0   274432     1895             0 polkitd
Apr 19 18:23:31 kernel: [   1078]   997  1078    29455        2   143360      112             0 chronyd
Apr 19 18:23:31 kernel: [   1133]     0  1133    50361       10   184320      125             0 gssproxy
Apr 19 18:23:31 kernel: [   1341]     0  1341    25729        0   233472      514             0 dhclient
Apr 19 18:23:31 kernel: [   1437]     0  1437    25729       17   225280      506             0 dhclient
Apr 19 18:23:31 kernel: [   1479]     0  1479   143579      102   434176     3216             0 tuned
Apr 19 18:23:31 kernel: [   1484]     0  1484   301877     1671   323584     4005          -999 containerd
Apr 19 18:23:31 kernel: [   1601]     0  1601    22429       16   204800      242             0 master
Apr 19 18:23:31 kernel: [   1615]    89  1615    22472        1   217088      252             0 qmgr
Apr 19 18:23:31 kernel: [   1642]     0  1642   318682     3177   483328     5117          -500 dockerd
Apr 19 18:23:31 kernel: [   1651]     0  1651   254861      223   274432      603             0 collectd
Apr 19 18:23:31 kernel: [   1661]     0  1661    31601       17    94208      141             0 crond
Apr 19 18:23:31 kernel: [   1662]     0  1662    27555        2    65536       31             0 agetty
Apr 19 18:23:31 kernel: [   1665]     0  1665    27555        3    61440       31             0 agetty
Apr 19 18:23:31 kernel: [   2090]     0  2090    38685        2   344064      336             0 sshd
Apr 19 18:23:31 kernel: [   2093]  1000  2093    38685        1   331776      339             0 sshd
Apr 19 18:23:31 kernel: [   2180]     0  2180    27056      123   253952      373             0 syslog-ng
Apr 19 18:23:31 kernel: [   2191]     0  2191    28234       25   253952      232         -1000 sshd
Apr 19 18:23:31 kernel: [   4766]   995  4766   181775     1361   143360     1521             0 node_exporter
Apr 19 18:23:31 kernel: [  16560]    89 16560    22455       13   208896      238             0 pickup
Apr 19 18:23:31 kernel: [  21525]     0 21525    38685        2   339968      330             0 sshd
Apr 19 18:23:31 kernel: [  21528]  1000 21528    38685        0   335872      357             0 sshd
Apr 19 18:23:31 kernel: [  21580]     0 21580   178111      501   102400     1488          -998 containerd-shim
Apr 19 18:23:31 kernel: [  21600]     0 21600      606        8    45056       12             0 tail
Apr 19 18:23:31 kernel: [  21613]     0 21613    59821        2   311296      289             0 sudo
Apr 19 18:23:31 kernel: [  21625]     0 21625   310307     1177   294912     3412             0 docker
Apr 19 18:23:31 kernel: [  21644]     0 21644      620        0    40960       20             0 sh  
Apr 19 18:23:31 kernel: [  21650]     0 21650  8193649  5011664 62324736  2590712             0 scylla-bench
Apr 19 18:23:31 kernel: [  24370]     0 24370    28781      118   262144      176             0 sshd
Apr 19 18:23:31 kernel: [  24989]     0 24989     2991      105    69632        7             0 dhclient-script
Apr 19 18:23:31 kernel: [  25009]     0 25009    27674      122   237568        0             0 sshd
Apr 19 18:23:31 kernel: [  25012]     0 25012    27672      120   241664        1             0 sshd
Apr 19 18:23:31 kernel: [  25019]     0 25019    27672      119   237568        1             0 sshd
Apr 19 18:23:31 kernel: [  25025]     0 25025    27672      119   241664        1             0 sshd
Apr 19 18:23:31 kernel: [  25031]     0 25031    27672      120   241664        1             0 sshd
Apr 19 18:23:31 kernel: [  25038]     0 25038    27143      112   229376        3             0 sshd
Apr 19 18:23:31 kernel: [  25043]     0 25043    26090      114   221184        0             0 sshd
Apr 19 18:23:31 kernel: [  25048]     0 25048    25045      107   217088        0             0 sshd
Apr 19 18:23:31 kernel: [  25051]     0 25051    19766       87   184320        0             0 sshd
Apr 19 18:23:31 kernel: [  25057]     0 25057    15314       67   147456        0             0 sshd
Apr 19 18:23:31 kernel: [  25063]     0 25063     6295       33    81920        0             0 sshd
Apr 19 18:23:31 kernel: Out of memory: Kill process 21650 (scylla-bench) score 967 or sacrifice child
Apr 19 18:23:31 kernel: Killed process 21650 (scylla-bench) total-vm:32774596kB, anon-rss:20046656kB, file-rss:0kB, shmem-rss:0kB

And here is how loader-3 logs looked like:

Configuration
Mode:            read
Workload:        uniform
Timeout:         3m0s
Max error number at row: 1000
Max error number:    unlimited
Retries:        
  number:        0
  min interval:      80ms
  max interval:      1s
  handler:       sb
Consistency level:   quorum
Partition count:     60
Clustering rows:     10000000
Clustering row size:     Fixed(2048)
Rows per request:    2000
Provide upper bound:     false
IN queries:      false
Order by:        none
No lower bound:      false
Page size:       1000
Concurrency:         700
Connections:         500
Maximum rate:        64000 op/s
Client compression:  true
Hdr memory consumption:  344254400 bytes

2023/04/19 17:44:56 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[43 3491014] consistency=QUORUM] || ERROR: 0 attempts applied: Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
2023/04/19 17:44:56 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[39 3977063] consistency=QUORUM] || ERROR: 0 attempts applied: Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
 time   ops/s  rows/s errors max    99.9th 99th   95th   90th   median mean   
2023/04/19 17:44:56 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[15 7113516] consistency=QUORUM] || ERROR: 0 attempts applied: Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
2023/04/19 17:44:56 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[55 3968802] consistency=QUORUM] || ERROR: 0 attempts applied: Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
...
<lots of similar timeout messages>
   6s    1264 2528000    328 4.8s   4.7s   4.7s   4.4s   4.2s   370ms  1.1s   
...
<lots of similar timeout messages>
...
2m0.1s     565 1131503    138 9.2s   8.7s   8.5s   8.3s   6s     18ms   1.3s   
2023/04/19 17:44:57 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[58 5460126] consistency=QUORUM] || ERROR: 0 attempts applied: Operation timed out for scylla_bench.test - received only 1 responses from 2 CL=QUORUM.
...
<lots of similar timeout messages>
...
2023/04/19 17:52:43 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[35 5246275] consistency=QUORUM] || ERROR: 0 attempts applied: EOF
2023/04/19 17:52:44 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[35 1570023] consistency=QUORUM] || ERROR: 0 attempts applied: EOF
2023/04/19 17:52:44 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[40 1738314] consistency=QUORUM] || ERROR: 0 attempts applied: Operation failed for scylla_bench.test - received 0 responses and 1 failures from 2 CL=QUORUM.
2023/04/19 17:52:44 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[54 4939889] consistency=QUORUM] || ERROR: 0 attempts applied: gocql: connection closed waiting for response
2023/04/19 17:52:44 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[9 440632] consistency=QUORUM] || ERROR: 0 attempts applied: gocql: connection closed waiting for response
...
2023/04/19 17:54:44 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 2000 " values=[19 4463825] consistency=QUORUM] || ERROR: 0 attempts applied: Operation failed for scylla_bench.test - received 0 responses and 1 failures from 2 CL=QUORUM.
2023/04/19 17:56:26 error: failed to connect to "[HostInfo hostname=\"10.4.2.156\" connectAddress=\"10.4.2.156\" peer=\"10.4.2.156\" rpc_address=\"10.4.2.156\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.4.2.156\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"eu-west\" rack=\"1a\" host_id=\"71ffdcba-9a8e-4cfd-8d68-ec36e4250afa\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response to connection startup within timeout
2023/04/19 17:56:59 error: failed to connect to "[HostInfo hostname=\"10.4.2.156\" connectAddress=\"10.4.2.156\" peer=\"10.4.2.156\" rpc_address=\"10.4.2.156\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.4.2.156\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"eu-west\" rack=\"1a\" host_id=\"71ffdcba-9a8e-4cfd-8d68-ec36e4250afa\" version=\"v3.0.8\" state=DOWN num_tokens=256]" due to error: gocql: no response to connection startup within timeout
Killed

Note that the lost connection to the 10.4.2.156 node is expected, because it is result of the soft_reboot_node nemesis:

2361909 < t:2023-04-19 17:45:48,389 f:nemesis.py      l:1742 c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: >>>>>>>>>>>>>Started random_disrupt_method soft_reboot_node
2361910 < t:2023-04-19 17:45:48,390 f:nemesis.py      l:371  c:sdcm.nemesis         p:INFO  > sdcm.nemesis.SisyphusMonkey: Current Target: Node longevity-large-partitions-4d-vp-br-db-node-4743601d-4 [3.252.162.48 | 10.4.2.156] (seed: False) with running nemesis: None
Nemesis list: Nemesis Name Target node Status Start End
disrupt_remove_node_then_add_node longevity-large-partitions-4d-vp-br-db-node-4743601d-2 Succeeded 2023-04-19 19:00:19 2023-04-19 22:05:08
disrupt_resetlocalschema longevity-large-partitions-4d-vp-br-db-node-4743601d-4 Succeeded 2023-04-19 18:28:37 2023-04-19 18:29:40
disrupt_nodetool_seed_decommission longevity-large-partitions-4d-vp-br-db-node-4743601d-2 Skipped 2023-04-19 18:28:18 2023-04-19 18:28:18
disrupt_network_random_interruptions longevity-large-partitions-4d-vp-br-db-node-4743601d-4 Skipped 2023-04-19 18:27:57 2023-04-19 18:27:58
disrupt_soft_reboot_node longevity-large-partitions-4d-vp-br-db-node-4743601d-4 Succeeded 2023-04-19 17:52:39 2023-04-19 17:57:16

So, probably we have some kind of memory leak here.

Logs: