scylladb / scylla-bench

42 stars 34 forks source link

A lower number of rows-per-partition than expected found after write load completed #113

Closed yarongilor closed 1 year ago

yarongilor commented 1 year ago

Installation details

Kernel Version: 5.15.0-1023-aws Scylla version (or git commit hash): 2022.2.0~rc5-20221121.feb292600fc4 with build-id 172f9538efb0893c97c86cdf05622925159f4fa2

Cluster size: 4 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

OS / Image: ami-0b77b476432e37d90 (aws: eu-west-1)

Test: longevity-large-partition-4days-test-rq2 Test id: 10079e1e-18ba-4ffc-a6b0-e4a399b7d23c Test name: scylla-staging/Longevity_yaron/longevity-large-partition-4days-test-rq2 Test config file(s):

Issue description

>>>>>>> scenario: s-b is set to write 50k rows per partition, for 800 partitions. This load run and completed in the prepare stress step without using any nemesis. Then a validation check of sct longevity counts the rows of each partition and report it. The result is that some partitions have a lower number than 50k.

The yaml configuration is:

yarongilor@yarongilor:~/Downloads/logs/sct-runner-10079e1e$ grep f:sct_config.py sct.log | grep clustering-row-count= -B 1
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=100
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -clustering-row-count=50000 -clustering-row-size=uniform:100..5120 -concurrency=250
--
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=100
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -clustering-row-count=50000 -partition-offset=101 -clustering-row-size=uniform:100..5120
--
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=100
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -clustering-row-count=50000 -partition-offset=201 -clustering-row-size=uniform:100..5120
--
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=100
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -clustering-row-count=50000 -partition-offset=301 -clustering-row-size=uniform:100..5120
--
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=100
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -clustering-row-count=50000 -partition-offset=401 -clustering-row-size=uniform:100..5120
--
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=100
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -clustering-row-count=50000 -partition-offset=501 -clustering-row-size=uniform:100..5120
--
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=100
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -clustering-row-count=50000 -partition-offset=601 -clustering-row-size=uniform:100..5120
--
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=sequential -mode=write -replication-factor=3 -partition-count=100
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -clustering-row-count=50000 -partition-offset=701 -clustering-row-size=uniform:100..5120
--
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  > - scylla-bench -workload=uniform -mode=read -select-order-by=desc -replication-factor=3
< t:2022-11-30 09:42:38,900 f:sct_config.py   l:1707 c:sdcm.sct_config      p:INFO  >   -partition-count=800 -clustering-row-count=50000 -clustering-row-size=uniform:100..5120

The count output has:

yarongilor@yarongilor:~/Downloads/logs/sct-runner-10079e1e$ grep 'Count result:' sct.log | grep -v 50000 -c
6
yarongilor@yarongilor:~/Downloads/logs/sct-runner-10079e1e$ grep 'Count result:' sct.log | grep 50000 -c
793
yarongilor@yarongilor:~/Downloads/logs/sct-runner-10079e1e$ grep 'Count result:' sct.log | grep -v 50000
< t:2022-11-30 10:20:04,699 f:tester.py       l:2475 c:LargePartitionLongevityTest p:DEBUG > Count result: ['', 'count', '-------', '26000', '', '(1 rows)']
< t:2022-11-30 10:21:25,742 f:tester.py       l:2475 c:LargePartitionLongevityTest p:DEBUG > Count result: ['', 'count', '-------', '48000', '', '(1 rows)']
< t:2022-11-30 10:23:37,900 f:tester.py       l:2475 c:LargePartitionLongevityTest p:DEBUG > Count result: ['', 'count', '-------', '46000', '', '(1 rows)']
< t:2022-11-30 10:26:03,296 f:tester.py       l:2475 c:LargePartitionLongevityTest p:DEBUG > Count result: ['', 'count', '-------', '24000', '', '(1 rows)']
< t:2022-11-30 10:27:03,741 f:tester.py       l:2475 c:LargePartitionLongevityTest p:DEBUG > Count result: ['', 'count', '-------', '40000', '', '(1 rows)']
< t:2022-11-30 10:27:13,169 f:tester.py       l:2475 c:LargePartitionLongevityTest p:DEBUG > Count result: ['', 'count', '-------', '38000', '', '(1 rows)']

we probably can try more debugging like retries of the counts to see if, for some reason, rows number is only updated after a more while <<<<<<<

Logs:

No logs captured during this run.

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Log links for testrun with test id 10079e1e-18ba-4ffc-a6b0-e4a399b7d23c | +-----------------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Date | Log type | Link | +-----------------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 20221130_110838 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221130_110838/grafana-screenshot-longevity-large-partition-4days-test-rq2-scylla-per-server-metrics-nemesis-20221130_110948-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221130_110838 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221130_110838/grafana-screenshot-overview-20221130_110839-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221130_195014 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221130_195014/grafana-screenshot-longevity-large-partition-4days-test-rq2-scylla-per-server-metrics-nemesis-20221130_195050-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221130_195014 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221130_195014/grafana-screenshot-overview-20221130_195014-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221130_195821 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221130_195821/grafana-screenshot-longevity-large-partition-4days-test-rq2-scylla-per-server-metrics-nemesis-20221130_195857-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221130_195821 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221130_195821/grafana-screenshot-overview-20221130_195821-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221201_091457 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221201_091457/grafana-screenshot-longevity-large-partition-4days-test-rq2-scylla-per-server-metrics-nemesis-20221201_091534-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221201_091457 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221201_091457/grafana-screenshot-overview-20221201_091457-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221201_094657 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221201_094657/grafana-screenshot-longevity-large-partition-4days-test-rq2-scylla-per-server-metrics-nemesis-20221201_094733-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221201_094657 | grafana | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221201_094657/grafana-screenshot-overview-20221201_094657-longevity-large-partitions-order-by-monitor-node-10079e1e-1.png | | 20221201_125438 | db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221201_125438/db-cluster-10079e1e.tar.gz | | 20221201_125438 | loader-set | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221201_125438/loader-set-10079e1e.tar.gz | | 20221201_125438 | monitor-set | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221201_125438/monitor-set-10079e1e.tar.gz | | 20221201_125438 | sct | https://cloudius-jenkins-test.s3.amazonaws.com/10079e1e-18ba-4ffc-a6b0-e4a399b7d23c/20221201_125438/sct-runner-10079e1e.tar.gz | +-----------------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Jenkins job URL

yarongilor commented 1 year ago

The issue is reproduced in a successive run -

yarongilor@yarongilor:~/Downloads/logs/sct-runner-8685a104$ grep 'Count result:' sct.log | grep 50000 -c
1597
yarongilor@yarongilor:~/Downloads/logs/sct-runner-8685a104$ grep 'Count result:' sct.log | grep -v 50000 -c
1
yarongilor@yarongilor:~/Downloads/logs/sct-runner-8685a104$ grep 'Count result:' sct.log | grep -v 50000 
< t:2022-12-01 14:04:11,048 f:tester.py       l:2475 c:LargePartitionLongevityTest p:DEBUG > Count result: ['', 'count', '-------', '46000', '', '(1 rows)']
vponomaryov commented 1 year ago

Look at the logs following logs:

Configuration
Mode:            write
Workload:        sequential
Timeout:         3m0s
Retries:        
  number:        10
  min interval:      80ms
  max interval:      1s
Consistency level:   quorum
Partition count:     100
Partition offset:    401
Clustering rows:     50000
Clustering row size:     Uniform(min=100, max=5120)
Rows per request:    2000
Page size:       1000
Concurrency:         250
Connections:         150
Maximum rate:        unlimited
Client compression:  true
Hdr memory consumption:  122948000 bytes

And following non 50k rows results:

< t:2022-11-30 10:20:04,699 > Count result: ['', 'count', '-------', '26000', '', '(1 rows)']

< t:2022-11-30 10:21:25,742 > Count result: ['', 'count', '-------', '48000', '', '(1 rows)']

< t:2022-11-30 10:23:37,900 > Count result: ['', 'count', '-------', '46000', '', '(1 rows)']

< t:2022-11-30 10:26:03,296 > Count result: ['', 'count', '-------', '24000', '', '(1 rows)']

< t:2022-11-30 10:27:03,741 > Count result: ['', 'count', '-------', '40000', '', '(1 rows)']

< t:2022-11-30 10:27:13,169 > Count result: ['', 'count', '-------', '38000', '', '(1 rows)']

We can see that absent data is multiple of 2k, which is number of rows per request. As a result, we see that 39 different requests were not successful.

As a first step, I propose to set the consistency level to be ALL instead of QUORUM for the 'prepare write' load. I expect the situation to be changed to all 50k.

yarongilor commented 1 year ago

Look at the logs following logs:

Configuration
Mode:          write
Workload:      sequential
Timeout:       3m0s
Retries:      
  number:      10
  min interval:        80ms
  max interval:        1s
Consistency level:     quorum
Partition count:   100
Partition offset:  401
Clustering rows:   50000
Clustering row size:   Uniform(min=100, max=5120)
Rows per request:  2000
Page size:         1000
Concurrency:       250
Connections:       150
Maximum rate:      unlimited
Client compression:    true
Hdr memory consumption:    122948000 bytes

And following non 50k rows results:

< t:2022-11-30 10:20:04,699 > Count result: ['', 'count', '-------', '26000', '', '(1 rows)']

< t:2022-11-30 10:21:25,742 > Count result: ['', 'count', '-------', '48000', '', '(1 rows)']

< t:2022-11-30 10:23:37,900 > Count result: ['', 'count', '-------', '46000', '', '(1 rows)']

< t:2022-11-30 10:26:03,296 > Count result: ['', 'count', '-------', '24000', '', '(1 rows)']

< t:2022-11-30 10:27:03,741 > Count result: ['', 'count', '-------', '40000', '', '(1 rows)']

< t:2022-11-30 10:27:13,169 > Count result: ['', 'count', '-------', '38000', '', '(1 rows)']

We can see that absent data is multiple of 2k, which is number of rows per request. As a result, we see that 39 different requests were not successful.

As a first step, I propose to set the consistency level to be ALL instead of QUORUM for the 'prepare write' load. I expect the situation to be changed to all 50k.

@vponomaryov right, but if the run_cqlsh queries using a quorum there should be no problem. perhaps the issue is CL is not 'quorumin the cqlshselect count(*)` queries.

roydahan commented 1 year ago
  1. Deafult CL of cqlsh is ONE.
  2. Why are you running it with cqlsh and not cql driver?
  3. I don't think the problem is with the query, but if you have a reproducer and a specific partition number, it would be good to query it specifically when the test reached this.
  4. Check the timeouts / failed writes metrics during the prepare to check @vponomaryov theory.
yarongilor commented 1 year ago
  1. Deafult CL of cqlsh is ONE.

great, this support above estimation

  1. Why are you running it with cqlsh and not cql driver?

It's an old code from long time ago. probably always worked ok where past s-b versions used.

  1. I don't think the problem is with the query, but if you have a reproducer and a specific partition number, it would be good to query it specifically when the test reached this.

I'll run a reproducer with a quorum-query using the driver: https://jenkins.scylladb.com/job/scylla-staging/job/Longevity_yaron/job/longevity-large-partition-4days-test-rq2/16/

  1. Check the timeouts / failed writes metrics during the prepare to check @vponomaryov theory.

(oops, perhaps i forgot to mention) i'm not sure about that, because IIRC, at end of test, this query did get the right number of rows (that's why test got the validation error).

yarongilor commented 1 year ago

The issue is not reproduced using a driver query with a quorum CL:

ubuntu@ip-10-4-0-134:~/sct-results/latest$ grep 'Count result:' sct.log | grep -v 50000
ubuntu@ip-10-4-0-134:~/sct-results/latest$ grep 'Count result:' sct.log | grep -c 50000
799

fixed in: https://github.com/scylladb/scylla-cluster-tests/pull/5335/commits/462a35531e9aec945dba46be914079757b5e7285

yarongilor commented 1 year ago

fixed in https://github.com/scylladb/scylla-cluster-tests/pull/5335:

ubuntu@ip-10-4-1-126:~/sct-results/latest$ grep 'Count result:' sct.log | grep -c 50000
800
ubuntu@ip-10-4-1-126:~/sct-results/latest$ grep 'Count result:' sct.log | grep -v 50000
ubuntu@ip-10-4-1-126:~/sct-results/latest$ 
ubuntu@ip-10-4-1-126:~/sct-results/latest$ cat test_id 
42bbe08b-c76f-4d7a-a019-8aa574535076