scylladb / scylla-bench

42 stars 34 forks source link

Add query retries on the scylla-bench level #120

Closed vponomaryov closed 1 year ago

vponomaryov commented 1 year ago

Add possibility to handle query retries by the scylla-bench to avoid various gocql bugs in this field. Also, enable it by default. To change the retry handler use the following new option:

-retry-handler=gocql
-retry-handler=sb

Only one approach at a time will be used. Only sb and gocql values for the new option are supported.

vponomaryov commented 1 year ago

Read retry example:

 time   ops/s  rows/s errors max    99.9th 99th   95th   90th   median mean   
   1s    4357       0      0 9.2ms  8.7ms  3.6ms  3.1ms  2.9ms  2.2ms  2.3ms  
   2s    4491       0      0 5.6ms  4.2ms  3.5ms  3.1ms  2.8ms  2.1ms  2.2ms  
   3s    4011       0      0 46ms   44ms   6ms    3.6ms  3.1ms  2.2ms  2.5ms  2023/03/01 12:58:51 SELECT pk, ck, v FROM scylla_bench.test WHERE pk = 78 AND ck >= 1250  LIMIT 10  || retry: attempt №0
2023/03/01 12:58:51 SELECT pk, ck, v FROM scylla_bench.test WHERE pk = 80 AND ck >= 1256  LIMIT 10  || retry: attempt №0
2023/03/01 12:58:51 SELECT pk, ck, v FROM scylla_bench.test WHERE pk = 92 AND ck >= 1255  LIMIT 10  || retry: attempt №0
2023/03/01 12:58:51 SELECT pk, ck, v FROM scylla_bench.test WHERE pk = 90 AND ck >= 1256  LIMIT 10  || retry: attempt №0
2023/03/01 12:58:51 SELECT pk, ck, v FROM scylla_bench.test WHERE pk = 76 AND ck >= 1255  LIMIT 10  || retry: attempt №0
2023/03/01 12:58:51 SELECT pk, ck, v FROM scylla_bench.test WHERE pk = 82 AND ck >= 1250  LIMIT 10  || retry: attempt №0

   4s    4692       0      0 38ms   29ms   9.9ms  3.7ms  2.2ms  1.8ms  2.1ms  
   5s    5594       0      0 4.3ms  3.3ms  2.3ms  2ms    2ms    1.8ms  1.8ms  
   6s    4742       0      0 12ms   11ms   5.6ms  3.5ms  2.8ms  1.9ms  2.1ms  
   7s    5669       0      0 7.8ms  6.6ms  2.5ms  2.1ms  2ms    1.7ms  1.8ms  
   8s    5704       0      0 2.8ms  2.6ms  2.1ms  2ms    1.9ms  1.7ms  1.8ms  

Write retry example:

 time   ops/s  rows/s errors max    99.9th 99th   95th   90th   median mean   
   1s     294     294      0 13ms   13ms   13ms   8.6ms  7.5ms  4.9ms  5.4ms  
   2s     294     294      0 12ms   12ms   12ms   9.5ms  8.8ms  5.5ms  5.9ms  
   3s     294     294      0 17ms   17ms   16ms   9.8ms  8.7ms  5.4ms  5.9ms  
   4s     294     294      0 11ms   11ms   11ms   8.7ms  8.4ms  6ms    6ms    2023/03/01 13:01:25 INSERT INTO scylla_bench.test (pk, ck, v) VALUES (435, 18, <15-bytes-value>) || retry: attempt №0
2023/03/01 13:01:25 INSERT INTO scylla_bench.test (pk, ck, v) VALUES (364, 18, <21-bytes-value>) || retry: attempt №0
2023/03/01 13:01:25 INSERT INTO scylla_bench.test (pk, ck, v) VALUES (80, 18, <25-bytes-value>) || retry: attempt №0
2023/03/01 13:01:25 INSERT INTO scylla_bench.test (pk, ck, v) VALUES (151, 18, <27-bytes-value>) || retry: attempt №0
2023/03/01 13:01:25 INSERT INTO scylla_bench.test (pk, ck, v) VALUES (222, 18, <27-bytes-value>) || retry: attempt №0

   5s     294     294      0 14ms   14ms   14ms   8ms    6.8ms  4.3ms  4.6ms  
   6s     294     294      0 4.3ms  4.3ms  4.3ms  2.7ms  2.6ms  2.4ms  2.4ms  
   7s     294     294      0 5ms    5ms    4.6ms  4ms    3.1ms  2.4ms  2.6ms  
   8s     294     294      0 6.5ms  6.5ms  5.6ms  5.4ms  5.1ms  4.2ms  4ms    

In the above examples we see the reaction to killed one Scylla pod.

vponomaryov commented 1 year ago

@piodul Please, review it. Context: https://github.com/scylladb/qa-tasks/issues/415

vponomaryov commented 1 year ago

Examples of outputs where we see the reaction to killed one Scylla pod:

read:

   1s    4517       0      0 22ms   15ms   3.5ms  3.1ms  2.9ms  2.1ms  2.2ms  
   2s    4465       0      0 15ms   15ms   3.5ms  3.1ms  2.8ms  2.2ms  2.2ms  2023/03/01 19:59:48 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 10 " values=[80 1216] consistency=QUORUM] || retry: attempt №0, sleep for 48.372823ms
2023/03/01 19:59:48 [query statement="SELECT pk, ck, v FROM scylla_bench.test WHERE pk = ? AND ck >= ?  LIMIT 10 " values=[82 1216] consistency=QUORUM] || retry: attempt №0, sleep for 75.240727ms

   3s    4028       0      0 14ms   12ms   7.3ms  4ms    3.3ms  2.2ms  2.4ms  
   4s    5550       0      0 3.2ms  3.2ms  2.2ms  2.1ms  2ms    1.8ms  1.8ms 

write:

   1s    2870    2870      0 17ms   11ms   8.1ms  6.1ms  5.4ms  3.2ms  3.4ms  
   2s    2862    2862      0 9ms    9ms    7.3ms  6.3ms  5.4ms  3.3ms  3.5ms  
   3s    2797    2797      0 30ms   25ms   8.1ms  6.2ms  5.5ms  3.2ms  3.5ms  2023/03/01 20:04:31 [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" values=[1241 860 <2168-bytes-value>] consistency=QUORUM] || retry: attempt №0, sleep for 77.163133ms
2023/03/01 20:04:31 [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" values=[1061 861 <783-bytes-value>] consistency=QUORUM] || retry: attempt №0, sleep for 50.660343ms
2023/03/01 20:04:31 [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" values=[1151 860 <2902-bytes-value>] consistency=QUORUM] || retry: attempt №0, sleep for 32.645856ms
2023/03/01 20:04:31 [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" values=[1001 862 <6258-bytes-value>] consistency=QUORUM] || retry: attempt №0, sleep for 78.015527ms

   4s    3091    3091      0 35ms   24ms   9.3ms  5.2ms  4.5ms  2.9ms  3.1ms  
   5s    3679    3679      0 6.5ms  6.5ms  5.8ms  4.8ms  4.1ms  2.6ms  2.7ms  

batch write:

   1s    1463   14630      0 39ms   39ms   15ms   12ms   9.8ms  5.8ms  6.4ms  
   2s     977    9770      0 102ms  102ms  76ms   18ms   15ms   7.4ms  9.8ms  2023/03/01 20:02:26 BATCH >>> [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" pk=1001 cks=[2640 2641 2642 2643 2644 2645 2646 2647 2648 2649] vSizes=[445 6237 1021 9798 942 8164 2621 237 5295 2984] consistency=QUORUM] || retry: attempt №0, sleep for 332.039µs
2023/03/01 20:02:26 BATCH >>> [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" pk=1061 cks=[2650 2651 2652 2653 2654 2655 2656 2657 2658 2659] vSizes=[2321 726 5821 5780 1570 4146 10204 408 4687 3926] consistency=QUORUM] || retry: attempt №0, sleep for 10.326035ms
2023/03/01 20:02:26 BATCH >>> [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" pk=1241 cks=[2630 2631 2632 2633 2634 2635 2636 2637 2638 2639] vSizes=[8344 9946 2818 1197 287 9293 10198 6770 2603 2150] consistency=QUORUM] || retry: attempt №0, sleep for 6.999505ms
2023/03/01 20:02:26 BATCH >>> [query statement="INSERT INTO scylla_bench.test (pk, ck, v) VALUES (?, ?, ?)" pk=1151 cks=[2690 2691 2692 2693 2694 2695 2696 2697 2698 2699] vSizes=[3341 6976 8934 8988 7969 3337 7624 1940 6564 8769] consistency=QUORUM] || retry: attempt №0, sleep for 79.645347ms

   3s     728    7280      0 119ms  118ms  80ms   28ms   20ms   9.5ms  13ms   
   4s    1020   10200      0 57ms   57ms   48ms   17ms   14ms   8.2ms  9.2ms  

counter_read:

   1s    2672    2672      0 8.3ms  6.1ms  4.3ms  3.1ms  2.8ms  1.8ms  1.9ms  
   2s    2723    2723      0 5.6ms  4.7ms  3.8ms  3ms    2.7ms  1.8ms  1.8ms  2023/03/01 20:12:17 [query statement="SELECT pk, ck, c1, c2, c3, c4, c5 FROM scylla_bench.test_counters WHERE pk = ? AND ck >= ?  LIMIT 1 " values=[4 2] consistency=QUORUM] || retry: attempt №0, sleep for 48.372823ms
2023/03/01 20:12:17 [query statement="SELECT pk, ck, c1, c2, c3, c4, c5 FROM scylla_bench.test_counters WHERE pk = ? AND ck >= ?  LIMIT 1 " values=[0 4] consistency=QUORUM] || retry: attempt №0, sleep for 75.240727ms
2023/03/01 20:12:17 [query statement="SELECT pk, ck, c1, c2, c3, c4, c5 FROM scylla_bench.test_counters WHERE pk = ? AND ck >= ?  LIMIT 1 " values=[1 1] consistency=QUORUM] || retry: attempt №0, sleep for 53.164804ms

   3s    2521    2521      0 27ms   21ms   7.3ms  4ms    2.9ms  1.7ms  1.9ms

counter_update:

   1s    1975    1975      0 14ms   14ms   5.9ms  4.3ms  3.7ms  2.4ms  2.5ms  
   2s    2011    2011      0 7ms    6.8ms  5.4ms  4.1ms  3.5ms  2.4ms  2.5ms  2023/03/01 20:08:21 [query statement="UPDATE scylla_bench.test_counters SET c1 = c1 + ?, c2 = c2 + ?, c3 = c3 + ?, c4 = c4 + ?, c5 = c5 + ? WHERE pk = ? AND ck = ?" values=[1 2 3 4 5 0 1] consistency=QUORUM] || retry: attempt №0, sleep for 48.372823ms
2023/03/01 20:08:21 [query statement="UPDATE scylla_bench.test_counters SET c1 = c1 + ?, c2 = c2 + ?, c3 = c3 + ?, c4 = c4 + ?, c5 = c5 + ? WHERE pk = ? AND ck = ?" values=[0 1 2 3 4 1 0] consistency=QUORUM] || retry: attempt №0, sleep for 75.240727ms
2023/03/01 20:08:21 [query statement="UPDATE scylla_bench.test_counters SET c1 = c1 + ?, c2 = c2 + ?, c3 = c3 + ?, c4 = c4 + ?, c5 = c5 + ? WHERE pk = ? AND ck = ?" values=[4 5 6 7 8 4 4] consistency=QUORUM] || retry: attempt №0, sleep for 53.164804ms
2023/03/01 20:08:21 [query statement="UPDATE scylla_bench.test_counters SET c1 = c1 + ?, c2 = c2 + ?, c3 = c3 + ?, c4 = c4 + ?, c5 = c5 + ? WHERE pk = ? AND ck = ?" values=[3 4 5 6 7 1 3] consistency=QUORUM] || retry: attempt №0, sleep for 35.017134ms
2023/03/01 20:08:21 [query statement="UPDATE scylla_bench.test_counters SET c1 = c1 + ?, c2 = c2 + ?, c3 = c3 + ?, c4 = c4 + ?, c5 = c5 + ? WHERE pk = ? AND ck = ?" values=[3 4 5 6 7 4 3] consistency=QUORUM] || retry: attempt №0, sleep for 33.970999ms

   3s    2109    2109      0 12ms   11ms   8.8ms  5.8ms  3.7ms  1.9ms  2.3ms  
   4s    2800    2800      0 6.5ms  6.1ms  4.7ms  3.2ms  2.7ms  1.7ms  1.8ms 

counter_scan:

 1.9s     179   70458      0 1.9s   1.9s   687ms  516ms  40ms   28ms   73ms   
 2.6s     173   40283      0 636ms  636ms  602ms  80ms   35ms   25ms   51ms   2023/03/01 20:10:12 [query statement="SELECT * FROM scylla_bench.test WHERE token(pk) >= ? AND token(pk) <= ?" values=[-6428410813565449808 -5869418568907584609] consistency=QUORUM] || retry: attempt №0, sleep for 48.372823ms
2023/03/01 20:10:12 [query statement="SELECT * FROM scylla_bench.test WHERE token(pk) >= ? AND token(pk) <= ?" values=[-8105387547539045408 -7546395302881180209] consistency=QUORUM] || retry: attempt №0, sleep for 75.240727ms
2023/03/01 20:10:12 [query statement="SELECT * FROM scylla_bench.test WHERE token(pk) >= ? AND token(pk) <= ?" values=[3074457345618258592 3633449590276123791] consistency=QUORUM] || retry: attempt №0, sleep for 53.164804ms