Streams: KCL is stuck in endless loop of "DescribeStream", not getting new shard-iterators, after decommission and adding new node

scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra

http://scylladb.com

GNU Affero General Public License v3.0

13.67k stars 1.3k forks source link

Streams: KCL is stuck in endless loop of "DescribeStream", not getting new shard-iterators, after decommission and adding new node #8012

Closed yarongilor closed 1 year ago

yarongilor commented 3 years ago

Installation details Scylla version (or git commit hash): 4.4.dev-0.20210104.d5da455d9 with build-id 56ab4a808a56049c86c5af7a0f4a680dc37f36d1 Cluster size: 6 nodes (i3.4xlarge) OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-018ec229a4231314c (aws: eu-west-1)

Test: longevity-alternator-streams-with-nemesis Test name: longevity_test.LongevityTest.test_custom_time Test config file(s):

[longevity-alternator-streams-100gb-4h.yaml] (https://github.com/scylladb/scylla-cluster-tests/blob/ce175e9a27d5305eed43e20d610d3fa72ad9c2de/test-cases/longevity/longevity-alternator-streams-100gb-4h.yaml)
[streams-with-nemesis.yaml] (https://github.com/scylladb/scylla-cluster-tests/blob/75fb6c9dd31de316886cf810a97c2e03f7e7ee72/configurations/alternator/streams-with-nemesis.yaml)

Issue description

====================================

scenario:

create source table: usertable_streams for copying its Streams data to: usertable_streams-dest.
configure KCL on streams of source to target table.
run YCSB insert load: using 2 loaders simultaneously, where each one writes to a different range.
run decommission nemesis to remove and add back a node for few hours.
complete load and wait for Streams completion.
run YCSB in read verification mode.

==> result: YCSB reported verification errors as below.

====================================

Restore Monitor Stack command: $ hydra investigate show-monitor b71161a4-182a-4447-8a35-9e9c63e89b16 Show all stored logs command: $ hydra investigate show-logs b71161a4-182a-4447-8a35-9e9c63e89b16

Test id: b71161a4-182a-4447-8a35-9e9c63e89b16

Logs: db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/db-cluster-b71161a4.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/loader-set-b71161a4.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/monitor-set-b71161a4.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/sct-runner-b71161a4.zip

Jenkins job URL

YCSB verification error:

/usr/local/openjdk-8/bin/java -cp /YCSB/dynamodb-binding/conf:/YCSB/conf:/YCSB/lib/HdrHistogram-2.1.4.jar:/YCSB/lib/aws-java-sdk-1.10.48.jar:/YCSB/lib/aws-java-sdk-api-gateway-1.10.48.jar:/YCSB/lib/aws-java-sdk-autoscaling-1.10.48.jar:/YCSB/lib/aws-java-sdk-cloudformation-1.10.48.jar:/YCSB/lib/aws-java-sdk-cloudfront-1.10.48.jar:/YCSB/lib/aws-java-sdk-cloudhsm-1.10.48.jar:/YCSB/lib/aws-java-sdk-cloudsearch-1.10.48.jar:/YCSB/lib/aws-java-sdk-cloudtrail-1.10.48.jar:/YCSB/lib/aws-java-sdk-cloudwatch-1.10.48.jar:/YCSB/lib/aws-java-sdk-cloudwatchmetrics-1.10.48.jar:/YCSB/lib/aws-java-sdk-codecommit-1.10.48.jar:/YCSB/lib/aws-java-sdk-codedeploy-1.10.48.jar:/YCSB/lib/aws-java-sdk-codepipeline-1.10.48.jar:/YCSB/lib/aws-java-sdk-cognitoidentity-1.10.48.jar:/YCSB/lib/aws-java-sdk-cognitosync-1.10.48.jar:/YCSB/lib/aws-java-sdk-config-1.10.48.jar:/YCSB/lib/aws-java-sdk-core-1.10.48.jar:/YCSB/lib/aws-java-sdk-datapipeline-1.10.48.jar:/YCSB/lib/aws-java-sdk-devicefarm-1.10.48.jar:/YCSB/lib/aws-java-sdk-directconnect-1.10.48.jar:/YCSB/lib/aws-java-sdk-directory-1.10.48.jar:/YCSB/lib/aws-java-sdk-dynamodb-1.10.48.jar:/YCSB/lib/aws-java-sdk-ec2-1.10.48.jar:/YCSB/lib/aws-java-sdk-ecr-1.10.48.jar:/YCSB/lib/aws-java-sdk-ecs-1.10.48.jar:/YCSB/lib/aws-java-sdk-efs-1.10.48.jar:/YCSB/lib/aws-java-sdk-elasticache-1.10.48.jar:/YCSB/lib/aws-java-sdk-elasticbeanstalk-1.10.48.jar:/YCSB/lib/aws-java-sdk-elasticloadbalancing-1.10.48.jar:/YCSB/lib/aws-java-sdk-elasticsearch-1.10.48.jar:/YCSB/lib/aws-java-sdk-elastictranscoder-1.10.48.jar:/YCSB/lib/aws-java-sdk-emr-1.10.48.jar:/YCSB/lib/aws-java-sdk-events-1.10.48.jar:/YCSB/lib/aws-java-sdk-glacier-1.10.48.jar:/YCSB/lib/aws-java-sdk-iam-1.10.48.jar:/YCSB/lib/aws-java-sdk-importexport-1.10.48.jar:/YCSB/lib/aws-java-sdk-inspector-1.10.48.jar:/YCSB/lib/aws-java-sdk-iot-1.10.48.jar:/YCSB/lib/aws-java-sdk-kinesis-1.10.48.jar:/YCSB/lib/aws-java-sdk-kms-1.10.48.jar:/YCSB/lib/aws-java-sdk-lambda-1.10.48.jar:/YCSB/lib/aws-java-sdk-logs-1.10.48.jar:/YCSB/lib/aws-java-sdk-machinelearning-1.10.48.jar:/YCSB/lib/aws-java-sdk-marketplacecommerceanalytics-1.10.48.jar:/YCSB/lib/aws-java-sdk-opsworks-1.10.48.jar:/YCSB/lib/aws-java-sdk-rds-1.10.48.jar:/YCSB/lib/aws-java-sdk-redshift-1.10.48.jar:/YCSB/lib/aws-java-sdk-route53-1.10.48.jar:/YCSB/lib/aws-java-sdk-s3-1.10.48.jar:/YCSB/lib/aws-java-sdk-ses-1.10.48.jar:/YCSB/lib/aws-java-sdk-simpledb-1.10.48.jar:/YCSB/lib/aws-java-sdk-simpleworkflow-1.10.48.jar:/YCSB/lib/aws-java-sdk-sns-1.10.48.jar:/YCSB/lib/aws-java-sdk-sqs-1.10.48.jar:/YCSB/lib/aws-java-sdk-ssm-1.10.48.jar:/YCSB/lib/aws-java-sdk-storagegateway-1.10.48.jar:/YCSB/lib/aws-java-sdk-sts-1.10.48.jar:/YCSB/lib/aws-java-sdk-support-1.10.48.jar:/YCSB/lib/aws-java-sdk-swf-libraries-1.10.48.jar:/YCSB/lib/aws-java-sdk-waf-1.10.48.jar:/YCSB/lib/aws-java-sdk-workspaces-1.10.48.jar:/YCSB/lib/commons-codec-1.6.jar:/YCSB/lib/commons-logging-1.1.3.jar:/YCSB/lib/core-0.18.0-SNAPSHOT.jar:/YCSB/lib/dynamodb-binding-0.18.0-SNAPSHOT.jar:/YCSB/lib/htrace-core4-4.1.0-incubating.jar:/YCSB/lib/httpclient-4.3.6.jar:/YCSB/lib/httpcore-4.3.3.jar:/YCSB/lib/jackson-annotations-2.5.0.jar:/YCSB/lib/jackson-core-2.5.3.jar:/YCSB/lib/jackson-core-asl-1.9.4.jar:/YCSB/lib/jackson-databind-2.5.3.jar:/YCSB/lib/jackson-mapper-asl-1.9.4.jar:/YCSB/lib/joda-time-2.8.1.jar:/YCSB/lib/log4j-1.2.17.jar:/YCSB/lib/cassandra-binding-0.18.0-SNAPSHOT.jar:/YCSB/lib/cassandra-driver-core-3.0.0.jar:/YCSB/lib/guava-16.0.1.jar:/YCSB/lib/metrics-core-3.1.2.jar:/YCSB/lib/netty-buffer-4.0.33.Final.jar:/YCSB/lib/netty-codec-4.0.33.Final.jar:/YCSB/lib/netty-common-4.0.33.Final.jar:/YCSB/lib/netty-handler-4.0.33.Final.jar:/YCSB/lib/netty-transport-4.0.33.Final.jar:/YCSB/lib/slf4j-api-1.7.25.jar site.ycsb.Client -db site.ycsb.db.DynamoDBClient -P workloads/workloadc -threads 80 -p readproportion=1.0 -p updateproportion=0.0 -p recordcount=78125000 -p fieldcount=1 -p fieldlength=128 -p operationcount=39062500 -p insertstart=39062500 -p insertcount=39062500 -p dataintegrity=true -p table=usertable_streams-dest -s -P /tmp/dynamodb.properties -p maxexecutiontime=39600 -t
Command line: -db site.ycsb.db.DynamoDBClient -P workloads/workloadc -threads 80 -p readproportion=1.0 -p updateproportion=0.0 -p recordcount=78125000 -p fieldcount=1 -p fieldlength=128 -p operationcount=39062500 -p insertstart=39062500 -p insertcount=39062500 -p dataintegrity=true -p table=usertable_streams-dest -s -P /tmp/dynamodb.properties -p maxexecutiontime=39600 -t
YCSB Client 0.18.0-SNAPSHOT

Loading workload...
Starting test.
Maximum execution time specified as: 39600 secs
2021-01-05 22:47:50:830 0 sec: 0 operations; est completion in 0 second 
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
...
2021-01-05 22:48:00:800 10 sec: 374332 operations; 37433.2 current ops/sec; est completion in 17 minutes [READ: Count=374359, Max=1088511, Min=1570, Avg=2057.57, 90=2049, 99=3033, 99.9=12247, 99.99=791039, Return(OK)=374416] [VERIFY: Count=374817, Max=439, Min=0, Avg=0.11, 90=0, 99=3, 99.9=11, 99.99=30, Return(OK)=13385, Return(ERROR)=361432] 
2021-01-05 22:48:10:800 20 sec: 807198 operations; 43286.6 current ops/sec; est completion in 15 minutes [READ: Count=432869, Max=11631, Min=1579, Avg=1845.47, 90=1986, 99=2481, 99.9=4707, 99.99=8071, Return(OK)=807336] [VERIFY: Count=432601, Max=55, Min=0, Avg=0.09, 90=0, 99=2, 99.9=8, 99.99=25, Return(OK)=29078, Return(ERROR)=778346] 
2021-01-05 22:48:20:800 30 sec: 1236011 operations; 42881.3 current ops/sec; est completion in 15 minutes [READ: Count=428793, Max=11831, Min=1579, Avg=1863.09, 90=1996, 99=3269, 99.9=5079, 99.99=8255, Return(OK)=1236044] [VERIFY: Count=428694, Max=59, Min=0, Avg=0.1, 90=0, 99=2, 99.9=8, 99.99=24, Return(OK)=44477, Return(ERROR)=1191639] 
2021-01-05 22:48:30:800 40 sec: 1664089 operations; 42807.8 current ops/sec; est completion in 14 minutes [READ: Count=428084, Max=14391, Min=1570, Avg=1866.2, 90=1999, 99=3373, 99.9=5079, 99.99=7867, Return(OK)=1664120] [VERIFY: Count=428078, Max=1902, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=24, Return(OK)=59939, Return(ERROR)=1604257] 
2021-01-05 22:48:40:800 50 sec: 2091345 operations; 42725.6 current ops/sec; est completion in 14 minutes [READ: Count=427255, Max=28671, Min=1585, Avg=1869.82, 90=2002, 99=3493, 99.9=5207, 99.99=10071, Return(OK)=2091390] [VERIFY: Count=427248, Max=45, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=23, Return(OK)=75286, Return(ERROR)=2016167] 
2021-01-05 22:48:50:800 60 sec: 2519518 operations; 42817.3 current ops/sec; est completion in 14 minutes [READ: Count=428166, Max=11343, Min=1593, Avg=1865.89, 90=1997, 99=3373, 99.9=5143, 99.99=7763, Return(OK)=2519543] [VERIFY: Count=428160, Max=53, Min=0, Avg=0.1, 90=0, 99=2, 99.9=8, 99.99=22, Return(OK)=90736, Return(ERROR)=2428862] 
2021-01-05 22:49:00:800 70 sec: 2947354 operations; 42783.6 current ops/sec; est completion in 14 minutes [READ: Count=427837, Max=14967, Min=1568, Avg=1867.34, 90=1999, 99=3469, 99.9=5083, 99.99=8051, Return(OK)=2947385] [VERIFY: Count=427843, Max=49, Min=0, Avg=0.1, 90=0, 99=2, 99.9=10, 99.99=26, Return(OK)=106437, Return(ERROR)=2841006] 
2021-01-05 22:49:10:800 80 sec: 3375700 operations; 42834.6 current ops/sec; est completion in 14 minutes [READ: Count=428346, Max=14519, Min=1573, Avg=1865.09, 90=1998, 99=3301, 99.9=5111, 99.99=7663, Return(OK)=3375731] [VERIFY: Count=428341, Max=47, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=23, Return(OK)=121971, Return(ERROR)=3253816] 
2021-01-05 22:49:20:800 90 sec: 3803268 operations; 42756.8 current ops/sec; est completion in 13 minutes [READ: Count=427567, Max=15407, Min=1593, Avg=1868.53, 90=1999, 99=3453, 99.9=5175, 99.99=8391, Return(OK)=3803305] [VERIFY: Count=427568, Max=47, Min=0, Avg=0.09, 90=0, 99=2, 99.9=8, 99.99=24, Return(OK)=137273, Return(ERROR)=3666082] 
2021-01-05 22:49:30:800 100 sec: 4231738 operations; 42847 current ops/sec; est completion in 13 minutes [READ: Count=428477, Max=11615, Min=1579, Avg=1864.56, 90=1995, 99=3345, 99.9=5191, 99.99=8327, Return(OK)=4231769] [VERIFY: Count=428470, Max=1968, Min=0, Avg=0.1, 90=0, 99=2, 99.9=10, 99.99=26, Return(OK)=152686, Return(ERROR)=4079141] 
2021-01-05 22:49:40:800 110 sec: 4659671 operations; 42793.3 current ops/sec; est completion in 13 minutes [READ: Count=427938, Max=32367, Min=1586, Avg=1866.62, 90=1998, 99=3433, 99.9=5135, 99.99=8107, Return(OK)=4659723] [VERIFY: Count=427948, Max=58, Min=0, Avg=0.1, 90=0, 99=2, 99.9=8, 99.99=23, Return(OK)=168171, Return(ERROR)=4491613] 
2021-01-05 22:49:50:800 120 sec: 5088575 operations; 42890.4 current ops/sec; est completion in 13 minutes [READ: Count=428896, Max=12591, Min=1585, Avg=1863.02, 90=1992, 99=3309, 99.9=5223, 99.99=8171, Return(OK)=5088593] [VERIFY: Count=428885, Max=60, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=24, Return(OK)=183548, Return(ERROR)=4905113] 
2021-01-05 22:50:00:800 130 sec: 5516543 operations; 42796.8 current ops/sec; est completion in 13 minutes [READ: Count=427968, Max=23903, Min=1581, Avg=1866.74, 90=1997, 99=3447, 99.9=5127, 99.99=6567, Return(OK)=5516573] [VERIFY: Count=427970, Max=50, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=24, Return(OK)=198783, Return(ERROR)=5317842] 
2021-01-05 22:50:10:800 140 sec: 5944466 operations; 42792.3 current ops/sec; est completion in 13 minutes [READ: Count=427913, Max=11823, Min=1598, Avg=1866.95, 90=1996, 99=3367, 99.9=5199, 99.99=8319, Return(OK)=5944480] [VERIFY: Count=427922, Max=56, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=24, Return(OK)=214216, Return(ERROR)=5730330] 
2021-01-05 22:50:20:800 150 sec: 6372224 operations; 42775.8 current ops/sec; est completion in 12 minutes [READ: Count=427773, Max=15175, Min=1590, Avg=1867.7, 90=1997, 99=3443, 99.9=5167, 99.99=8123, Return(OK)=6372250] [VERIFY: Count=427746, Max=65, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=25, Return(OK)=229470, Return(ERROR)=6142824] 
2021-01-05 22:50:30:800 160 sec: 6800395 operations; 42817.1 current ops/sec; est completion in 12 minutes [READ: Count=428170, Max=28671, Min=1586, Avg=1865.86, 90=1995, 99=3391, 99.9=5199, 99.99=9087, Return(OK)=6800413] [VERIFY: Count=428222, Max=60, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=24, Return(OK)=244830, Return(ERROR)=6555686] 
2021-01-05 22:50:40:800 170 sec: 7228422 operations; 42802.7 current ops/sec; est completion in 12 minutes [READ: Count=428018, Max=30351, Min=1586, Avg=1866.55, 90=1995, 99=3443, 99.9=5191, 99.99=8171, Return(OK)=7228435] [VERIFY: Count=427971, Max=56, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=24, Return(OK)=260296, Return(ERROR)=6968194] 
2021-01-05 22:50:50:800 180 sec: 7657251 operations; 42882.9 current ops/sec; est completion in 12 minutes [READ: Count=428835, Max=11695, Min=1579, Avg=1862.95, 90=1992, 99=3291, 99.9=5223, 99.99=7843, Return(OK)=7657274] [VERIFY: Count=428823, Max=60, Min=0, Avg=0.1, 90=0, 99=2, 99.9=9, 99.99=25, Return(OK)=275918, Return(ERROR)=7381394] 
2021-01-05 22:51:00:800 190 sec: 8085775 operations; 42852.4 current ops/sec; est completion in 12 minutes [READ: Count=428519, Max=12711, Min=1569, Avg=1864.29, 90=1992, 99=3401, 99.9=5175, 99.99=7123, Return(OK)=8085795] [VERIFY: Count=428547, Max=52, Min=0, Avg=0.1, 90=0, 99=2, 99.9=8, 99.99=24, Return(OK)=291344, Return(ERROR)=7794510] 
2021-01-05 22:51:10:800 200 sec: 8513972 operations; 42819.7 current ops/sec; est completion in 11 minutes [READ: Count=428202, Max=11807, Min=1578, Avg=1865.75, 90=1993, 99=3381, 99.9=5287, 99.99=9695, Return(OK)=8513992] [VERIFY: Count=428184, Max=2281, Min=0, Avg=0.1, 90=0, 99=2, 99.9=7, 99.99=23, Return(OK)=306537, Return(ERROR)=8207506]

the YCSB insert and read commands as found on SCT yaml file are:

  - >-
    hydra-kcl -t usertable_streams --timeout 32400
  - >-
    bin/ycsb load dynamodb -P workloads/workloadc -threads 10 -p recordcount=78125000
    -p fieldcount=1 -p fieldlength=128 -p dataintegrity=true
    -p insertstart=0 -p insertcount=39062500 -p table=usertable_streams
  - >-
    bin/ycsb load dynamodb -P workloads/workloadc -threads 10 -p recordcount=78125000
    -p fieldcount=1 -p fieldlength=128 -p dataintegrity=true
    -p insertstart=39062500 -p insertcount=39062500 -p table=usertable_streams
  - >-
    table_compare interval=120 ; src_table="alternator_usertable_streams".usertable_streams ; dst_table="alternator_usertable_streams-dest"."usertable_streams-dest"
stress_cmd:
  >-
  bin/ycsb run dynamodb -P workloads/workloadc -threads 80
  -p readproportion=1.0 -p updateproportion=0.0 -p recordcount=78125000
  -p fieldcount=1 -p fieldlength=128 -p operationcount=39062500
  -p insertstart=0 -p insertcount=39062500
  -p dataintegrity=true -p table=usertable_streams-dest

  >-
  bin/ycsb run dynamodb -P workloads/workloadc -threads 80
  -p readproportion=1.0 -p updateproportion=0.0 -p recordcount=78125000
  -p fieldcount=1 -p fieldlength=128 -p operationcount=39062500
  -p insertstart=39062500 -p insertcount=39062500
  -p dataintegrity=true -p table=usertable_streams-dest

round_robin: true

n_loaders: 3

some YCSB write temporary connection issues noticed during test, probably due to decommissions, but overall it reported successful run:

2021-01-05 16:50:24:907 11250 sec: 19533697 operations; 1042.7 current ops/sec; est completion in 3 hours 7 minutes [INSERT: Count=10426, Max=95039, Min=2696, Avg=9587.38, 90=12975, 99=45023, 99.9=73535, 99.99=89087, Return(OK)=19533699] 
11251213 [Thread-11] INFO  com.amazonaws.http.AmazonHttpClient  -Unable to execute HTTP request: alternator:8080 failed to respond
org.apache.http.NoHttpResponseException: alternator:8080 failed to respond
...
2021-01-05 21:05:23:592 26548 sec: 39062500 operations; 234.05 current ops/sec; [CLEANUP: Count=1, Max=1, Min=1, Avg=1, 90=1, 99=1, 99.9=1, 99.99=1] [INSERT: Count=2033, Max=52415, Min=2738, Avg=4268.11, 90=6199, 99=28815, 99.9=49119, 99.99=52415, Return(OK)=39062500] 
[OVERALL], RunTime(ms), 26548686
[OVERALL], Throughput(ops/sec), 1471.353422161835
[TOTAL_GCS_PS_Scavenge], Count, 63369
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 75137
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.28301589012729295
[TOTAL_GCS_PS_MarkSweep], Count, 19
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 272
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.001024532814919729
[TOTAL_GCs], Count, 63388
[TOTAL_GC_TIME], Time(ms), 75409
[TOTAL_GC_TIME_%], Time(%), 0.28404042294221266
[CLEANUP], Operations, 10
[CLEANUP], AverageLatency(us), 0.7
[CLEANUP], MinLatency(us), 0
[CLEANUP], MaxLatency(us), 2
[CLEANUP], 95thPercentileLatency(us), 2
[CLEANUP], 99thPercentileLatency(us), 2
[INSERT], Operations, 39062500
[INSERT], AverageLatency(us), 6191.268768256
[INSERT], MinLatency(us), 2652
[INSERT], MaxLatency(us), 899071
[INSERT], 95thPercentileLatency(us), 14183
[INSERT], 99thPercentileLatency(us), 48831
[INSERT], Return=OK, 39062500

the second YCSB thread running the second range part of inserts, reported successful writing as well:

2021-01-05 22:47:48:497 32700 sec: 39062500 operations; 325.58 current ops/sec; [CLEANUP: Count=1, Max=0, Min=0, Avg=0, 90=0, 99=0, 99.9=0, 99.99=0] [INSERT: Count=70, Max=3627, Min=2744, Avg=3070.17, 90=3331, 99=3539, 99.9=3627, 99.99=3627, Return(OK)=39062500] 
[OVERALL], RunTime(ms), 32700215
[OVERALL], Throughput(ops/sec), 1194.5640112763786
[TOTAL_GCS_PS_Scavenge], Count, 65574
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 80532
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.24627361012763985
[TOTAL_GCS_PS_MarkSweep], Count, 35
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 491
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.0015015191796139567
[TOTAL_GCs], Count, 65609
[TOTAL_GC_TIME], Time(ms), 81023
[TOTAL_GC_TIME_%], Time(%), 0.2477751293072538
[CLEANUP], Operations, 10
[CLEANUP], AverageLatency(us), 0.2
[CLEANUP], MinLatency(us), 0
[CLEANUP], MaxLatency(us), 2
[CLEANUP], 95thPercentileLatency(us), 2
[CLEANUP], 99thPercentileLatency(us), 2
[INSERT], Operations, 39062500
[INSERT], AverageLatency(us), 6770.125490304
[INSERT], MinLatency(us), 2578
[INSERT], MaxLatency(us), 694783
[INSERT], 95thPercentileLatency(us), 21039
[INSERT], 99thPercentileLatency(us), 55487
[INSERT], Return=OK, 39062500

the issue was also reproduced on another run, using ChaosMonkey of multiple nemesis instead of Decommission nemesis:


Restore Monitor Stack command: `$ hydra investigate show-monitor c2e78390-87ed-4659-8f26-17375341a90b`
Show all stored logs command: `$ hydra investigate show-logs c2e78390-87ed-4659-8f26-17375341a90b`

Test id: `c2e78390-87ed-4659-8f26-17375341a90b`

Logs:
db-cluster - [https://cloudius-jenkins-test.s3.amazonaws.com/c2e78390-87ed-4659-8f26-17375341a90b/20210126_041659/db-cluster-c2e78390.zip](https://cloudius-jenkins-test.s3.amazonaws.com/c2e78390-87ed-4659-8f26-17375341a90b/20210126_041659/db-cluster-c2e78390.zip)
loader-set - [https://cloudius-jenkins-test.s3.amazonaws.com/c2e78390-87ed-4659-8f26-17375341a90b/20210126_041659/loader-set-c2e78390.zip](https://cloudius-jenkins-test.s3.amazonaws.com/c2e78390-87ed-4659-8f26-17375341a90b/20210126_041659/loader-set-c2e78390.zip)
monitor-set - [https://cloudius-jenkins-test.s3.amazonaws.com/c2e78390-87ed-4659-8f26-17375341a90b/20210126_041659/monitor-set-c2e78390.zip](https://cloudius-jenkins-test.s3.amazonaws.com/c2e78390-87ed-4659-8f26-17375341a90b/20210126_041659/monitor-set-c2e78390.zip)
sct-runner - [https://cloudius-jenkins-test.s3.amazonaws.com/c2e78390-87ed-4659-8f26-17375341a90b/20210126_041659/sct-runner-c2e78390.zip](https://cloudius-jenkins-test.s3.amazonaws.com/c2e78390-87ed-4659-8f26-17375341a90b/20210126_041659/sct-runner-c2e78390.zip)

[Jenkins job URL](https://jenkins.scylladb.com/job/scylla-staging/job/Longevity_yaron/job/longevity-alternator-streams-with-nemesis/12/)

yarongilor commented 3 years ago

@fruch , please review / confirm assigning this to Dev

Cc @elcallio , @nyh

nyh commented 3 years ago

I need help understanding what problem you actually see.

You said there is "YCSB verification error" and quoted some lines, but I don't understand at all what this is saying, and what kind of error is this... Are there missing rows? Wrong rows? Or what? It would be helpful to know what kind of error we see, not just that it "failed".
Is the write workload using CL=QUORUM and the read using CL=QUORUM as well?
Are we sure that the workload had zero write failures? A write error which is retried is fine. If there is a write error which is not retried, it can lead to only one replica succeeding, and cause random results from read (of both base table and CDC log table, separately)?

All that being said, since handling cluster changes - as you have in this case - is the most complicated part of CDC and Alternator Streams makes it even more complicated - I wouldn't be surprised if this is indeed an Alternator or CDC bug. @yarongilor, @fruch, can you make an attempt to prepare a minimal reproducer? Maybe it doesn't need running 10 hours and hundreds of node decomissions - maybe it can be reproduced with a much shorter run and just one decomission? (I don't know if it can, but maybe...).

fruch commented 3 years ago

I need help understanding what problem you actually see.

You said there is "YCSB verification error" and quoted some lines, but I don't understand at all what this is saying, and what kind of error is this... Are there missing rows? Wrong rows? Or what? It would be helpful to know what kind of error we see, not just that it "failed".

Is the write workload using CL=QUORUM and the read using CL=QUORUM as well?

Are we sure that the workload had zero write failures? A write error which is retried is fine. If there is a write error which is not retried, it can lead to only one replica succeeding, and cause random results from read (of both base table and CDC log table, separately)?

All that being said, since handling cluster changes - as you have in this case - is the most complicated part of CDC and Alternator Streams makes it even more complicated - I wouldn't be surprised if this is indeed an Alternator or CDC bug. @yarongilor, @fruch, can you make an attempt to prepare a minimal reproducer? Maybe it doesn't need running 10 hours and hundreds of node decomissions - maybe it can be reproduced with a much shorter run and just one decomission? (I don't know if it can, but maybe...).

we'll try to see if we can come up with smaller sized reproducer, but it might take us a bit more then 10 hours it's gonna run.

1) we are not running YCSB with full logs, we could try, but that would generate a ton of them. 2) I'm not sure why you are asking about CL ? it's alternator right ? we have control only on reads..., and in those case the cluster was health when reading. (i.e. it wasn't inside of a nemesis) 3) we'll need to check write failure, we didn't took a closer look.

on-top of that we have a few issues on the test end, for example not collecting the hydra-kcl log (i.e. the process which reading from the stream),

We might need a bit of help define which important part we might want to add logs for debugging (at the KCL end).

one more option is try a reproducer in dtest, without KCL (to begin with), (i.e. reading from stream while doing decommission or other topology changes tasks) It's a task on our backlog. but it's not started yet. we currently only have a test that read the stream before and after the decomission.

fruch commented 3 years ago

The run mentioned here is also with old version of monitoring, we might have better view with a monitoring that have dashboard for the alternator streams APIs, and we'll be running next with the newer version

nyh commented 3 years ago

@yarongilor I am asking all these questions because if I were tasked to debug this, I would have no idea what your workload is doing, or what exactly is the "proof" that there is even a problem. How does the test recognize an error? It sees the two tables are different? Do we know what is different - does one table (which) have more rows? Errors in the data? Or what?

You're right that Alternator writes are always CL=QUORUM. Even if all writes succeeded (did they? it's important to know that...) you still need to do CL=QUORUM reads (i.e., Alternator consistent reads) otherwise you can miss some data. You can't say that the cluster is "healthy" if you didn't do a repair - did you?

fruch commented 3 years ago

@yarongilor I am asking all these questions because if I were tasked to debug this, I would have no idea what your workload is doing, or what exactly is the "proof" that there is even a problem. How does the test recognize an error? It sees the two tables are different? Do we know what is different - does one table (which) have more rows? Errors in the data? Or what?

You're right that Alternator writes are always CL=QUORUM. Even if all writes succeeded (did they? it's important to know that...) you still need to do CL=QUORUM reads (i.e., Alternator consistent reads) otherwise you can miss some data. You can't say that the cluster is "healthy" if you didn't do a repair - did you?

I'll check that we are using consist reads in the verification stage. we might have missed that.

nyh commented 3 years ago

Some more thoughts about this test:

An idea raised by @slivne is that we should run the verification ycsb also on the base table - maybe some data is missing already there, and not just in the copy?
verified by @fruch below that this is not the problem: The YCSB verification code - that uses the DynamoDB API? If so, is it using consistent read (which translates to CL=local_quorum)? If not, it should.
I would like go be confident that none of the base-table writes failed, and also that the KCL code did not encounter any write erros which could mean that some data was read from the stream but just failed to be written to the second table.
Maybe @asias or @slivne can help me understand here, but is "nodetool decommission" and then adding a new node even safe in the sense of consistency? If we take three nodes, and decomission one, isn't it possible that we are left with two replicas with different values for a row (old and new), and then when we add a new node, it copies data from just one of them, and may copy the old version, and now we have two copies of the old value and one of the new, and a quorum read can read old data?

fruch commented 3 years ago

@yarongilor I am asking all these questions because if I were tasked to debug this, I would have no idea what your workload is doing, or what exactly is the "proof" that there is even a problem. How does the test recognize an error? It sees the two tables are different? Do we know what is different - does one table (which) have more rows? Errors in the data? Or what? You're right that Alternator writes are always CL=QUORUM. Even if all writes succeeded (did they? it's important to know that...) you still need to do CL=QUORUM reads (i.e., Alternator consistent reads) otherwise you can miss some data. You can't say that the cluster is "healthy" if you didn't do a repair - did you?

I'll check that we are using consist reads in the verification stage. we might have missed that.

we by default using consistent reads (on all ycsb runs): https://github.com/scylladb/scylla-cluster-tests/blob/master/sdcm/ycsb_thread.py#L134

slivne commented 3 years ago

On Tue, Feb 2, 2021 at 3:41 PM nyh notifications@github.com wrote:

Some more thoughts about this test:

An idea raised by @slivne https://github.com/slivne is that we should run the verification ycsb also on the base table - maybe some data is missing already there, and not just in the copy?

The YCSB verification code - that uses the DynamoDB API? If so, is it using consistent read (which translates to CL=local_quorum)? If not, it should.

I would like go be confident that none of the base-table writes failed, and also that the KCL code did not encounter any write erros which could mean that some data was read from the stream but just failed to be written to the second table.

Maybe @asias https://github.com/asias or @slivne https://github.com/slivne can help me understand here, but is "nodetool decommission" and then adding a new node even safe in the sense of consistency? If we take three nodes, and decomission one, isn't it possible that we are left with two replicas with different values for a row (old and new), and then when we add a new node, it copies data from just one of them, and may copy the old version, and now we have two copies of the old value and one of the new, and a quorum read can read old data?

In this case its not safe -you are loosing one replica.

Yet in any other case in which 3 replicas remain the process is supposed to be safe.

Starting from 3 nodes, adding one, decommissioning another is safe Starting from 4 nodes, decommissioning one, adding another is safe

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/8012#issuecomment-771641853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2OCCH3ATULIWQDSYO5FOLS476HFANCNFSM4W5H5G7A .

nyh commented 3 years ago

In this case its not safe -you are loosing one replica. Yet in any other case in which 3 replicas remain the process is supposed to be safe. Starting from 3 nodes, adding one, decommissioning another is safe Starting from 4 nodes, decommissioning one, adding another is safe

@fruch can you tell me a bit about what the "decomission nemisis" is doing? Is it adding a node and then commissioning a node or the opposite order.

@slivne is the safe decomission procedure documented somewhere? I'm not not sure I understand why even your order is safe: Consider we have 3 replicas, two new values, one old. Now we add a node. That, if I understand correctly, streams some random copy to the new node - so we can have a new old value. At this point we have 4 replicas, two new values, two old values. Now we do a decomission. Let's say we decomission a node with the new value. If I understand correctly (correct me if I'm wrong) it sends this new value to just one (not all as in repair) of the other nodes. If it happens to send it to the node that already had the new value, we end up with 3 replicas, one new and two old - bad.

yarongilor commented 3 years ago

thanks @nyh . what currently slows down the clarification of the issue is that it is not reproduced consistently. once we'll come up with a more consistent reproduction, we can try narrowing down the scenario.

perhaps a somehow useful new insight is that the issue wasn't reproduced with the following nemesis, where no decommission ran:

restart_then_repair_node
major_compaction
add_drop_column
multiple_hard_reboot_node
nodetool_drain
mgmt_backup_specific_keyspaces
snapshot_operations
multiple_hard_reboot_node
mgmt_repair_cli
kill_scylla
nodetool_cleanup
run_cdcstressor_tool
repair_streaming_err

the issue did reproduce in another test that did run some kind of remove and add node, with this nemesis sequence:

mgmt_repair_cli
run_unique_sequence
remove_node_then_add_node
mgmt_repair_cli
multiple_hard_reboot_node
nodetool_enospc
restart_then_repair_node
nodetool_drain
decommission_streaming_err
network_reject_thrift
truncate_large_partition

fruch commented 3 years ago

In this case its not safe -you are loosing one replica. Yet in any other case in which 3 replicas remain the process is supposed to be safe. Starting from 3 nodes, adding one, decommissioning another is safe Starting from 4 nodes, decommissioning one, adding another is safe

@fruch can you tell me a bit about what the "decomission nemisis" is doing? Is it adding a node and then commissioning a node or the opposite order.

in our test there are 6 nodes, decommission nemesis is: 1) decomission one node using nodetoll decomission 2) deleteing this node from AWS 3) provision and add a new node (wait for steam to end, and not to be available) 4) run nodetool cleanup on all of the nodes.

(basically following this https://docs.scylladb.com/operating-scylla/procedures/cluster-management/remove_node/)

@slivne is the safe decomission procedure documented somewhere? I'm not not sure I understand why even your order is safe: Consider we have 3 replicas, two new values, one old. Now we add a node. That, if I understand correctly, streams some random copy to the new node - so we can have a new old value. At this point we have 4 replicas, two new values, two old values. Now we do a decomission. Let's say we decomission a node with the new value. If I understand correctly (correct me if I'm wrong) it sends this new value to just one (not all as in repair) of the other nodes. If it happens to send it to the node that already had the new value, we end up with 3 replicas, one new and two old - bad.

nyh commented 3 years ago

in our test there are 6 nodes, decommission nemesis is:
1. decomission one node using `nodetoll decomission`

2. deleteing this node from AWS

3. provision and add a new node (wait for steam to end, and not to be available)

4. run `nodetool cleanup` on all of the nodes.
(basically following this https://docs.scylladb.com/operating-scylla/procedures/cluster-management/remove_node/)

Thanks @fruch. It seems it is using the reverse order from the instructions? Please see above why I think it can lead to only one out of three replicas having the correct content, and wrong quorum read results. Are you really using the reverse order, or just explained it here in the wrong order?

I'm not sure (not even close to sure...) that this is the problem we are seeing here, though. Might be a completely unrelated red herring.

yarongilor commented 3 years ago

@nyh , @fruch , there may be a potential greater hint for the source of the problem. it is by looking at the cfstats periodical report of source and destination Stream tables. the number of partitions reported for destination table ceases to increase after the first decommission. while on another test without decommission it kept on increasing with a value similar to source table up to the end of writes.

the number of partitions on the dest table gets stuck around 800K - 1M after decommission until the write load ends, while the source table gets to 40M: (ignore the "has not been published or dumped, maybe you missed .publish()" testing issue)

< t:2021-02-02 13:54:54,214 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 0/0 ==
< t:2021-02-02 13:54:54,214 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[10.113263845443726/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 0/0 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 13:55:11,490 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 0/13707 ==
< t:2021-02-02 13:55:11,490 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[27.389247179031372/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 0/13707 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 13:55:35,167 f:nemesis.py      l:566  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.DecommissionMonkey: Set current_disruption -> DecommissionMonkey on target node Node alternator-streams-nemesis-stream-w-db-node-3a649026-6 [54.75.30.201 | 10.0.2.194] (seed: False)
< t:2021-02-02 13:56:13,900 f:nemesis.py      l:566  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.DecommissionMonkey: Set current_disruption -> Decommission Node alternator-streams-nemesis-stream-w-db-node-3a649026-6 [54.75.30.201 | 10.0.2.194] (seed: False)
< t:2021-02-02 13:57:38,338 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 144306/307163 ==
< t:2021-02-02 13:57:38,338 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[174.23685503005981/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 144306/307163 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:00:05,646 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 387845/563285 ==
< t:2021-02-02 14:00:05,646 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[321.54533791542053/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 387845/563285 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:02:40,940 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 614767/791003 ==
< t:2021-02-02 14:02:40,940 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[476.83937549591064/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 614767/791003 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:05:08,205 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 834305/1013601 ==
< t:2021-02-02 14:05:08,205 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[624.1042518615723/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 834305/1013601 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:07:26,488 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 828948/1109028 ==
< t:2021-02-02 14:07:26,488 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[762.3874325752258/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 828948/1109028 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:09:47,539 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 851657/1404906 ==
< t:2021-02-02 14:09:47,539 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[903.4381284713745/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 851657/1404906 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:12:33,990 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1019046/1920401 ==
< t:2021-02-02 14:12:33,990 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1069.8891949653625/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1019046/1920401 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:14:51,424 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1018787/2176932 ==
< t:2021-02-02 14:14:51,424 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1207.3235836029053/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1018787/2176932 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:17:13,430 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 828948/2162728 ==
< t:2021-02-02 14:17:13,430 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1349.3288397789001/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 828948/2162728 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:19:47,497 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/2739892 ==
< t:2021-02-02 14:19:47,497 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1503.3963222503662/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/2739892 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:22:14,230 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1018787/3001382 ==
< t:2021-02-02 14:22:14,230 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1650.1296529769897/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1018787/3001382 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:24:42,605 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 851657/3039601 ==
< t:2021-02-02 14:24:42,605 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1798.504313468933/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 851657/3039601 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:27:04,404 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 828948/3210123 ==
< t:2021-02-02 14:27:04,404 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1940.303292274475/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 828948/3210123 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:29:29,591 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/3839237 ==
< t:2021-02-02 14:29:29,592 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2085.490391731262/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/3839237 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:31:53,517 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 851657/3823202 ==
< t:2021-02-02 14:31:53,517 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2229.416089773178/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 851657/3823202 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:34:16,669 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1074511/4565785 ==
< t:2021-02-02 14:34:16,669 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2372.5681977272034/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1074511/4565785 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:36:39,111 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/4653125 ==
< t:2021-02-02 14:36:39,112 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2515.0106549263/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/4653125 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:37:35,934 f:nemesis.py      l:566  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.DecommissionMonkey: Set current_disruption -> DecommissionMonkey on target node Node alternator-streams-nemesis-stream-w-db-node-3a649026-2 [54.229.107.68 | 10.0.0.138] (seed: False)
< t:2021-02-02 14:38:09,036 f:nemesis.py      l:566  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.DecommissionMonkey: Set current_disruption -> Decommission Node alternator-streams-nemesis-stream-w-db-node-3a649026-2 [54.229.107.68 | 10.0.0.138] (seed: False)
< t:2021-02-02 14:38:57,808 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/5824079 ==
< t:2021-02-02 14:38:57,808 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2653.70747923851/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/5824079 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:41:16,517 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1144310/6333107 ==
< t:2021-02-02 14:41:16,518 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2792.416520357132/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1144310/6333107 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:43:36,492 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1144310/6652175 ==
< t:2021-02-02 14:43:36,492 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2932.3915326595306/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1144310/6652175 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:46:08,000 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 984475/6158596 ==
< t:2021-02-02 14:46:08,000 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3083.8988795280457/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 984475/6158596 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:48:58,215 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1148175/7051212 ==
< t:2021-02-02 14:48:58,215 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3254.1145927906036/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1148175/7051212 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:51:24,049 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 1048733/6952274 ==
< t:2021-02-02 14:51:24,049 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3399.9478480815887/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1048733/6952274 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:53:53,908 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 878938/6552516 ==
< t:2021-02-02 14:53:53,908 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3549.806868314743/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 878938/6552516 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:56:21,508 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 870453/6671281 ==
< t:2021-02-02 14:56:21,508 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3697.406831264496/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 870453/6671281 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:58:46,342 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 853205/6697584 ==
< t:2021-02-02 14:58:46,342 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3842.2413997650146/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 853205/6697584 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:01:11,880 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/7576755 ==
< t:2021-02-02 15:01:11,880 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3987.7790808677673/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/7576755 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:03:35,490 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 870453/7487232 ==
< t:2021-02-02 15:03:35,490 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4131.389223098755/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 870453/7487232 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:06:20,902 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 895384/7874602 ==
< t:2021-02-02 15:06:20,902 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4296.800801992416/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 895384/7874602 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:08:42,948 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/8378593 ==
< t:2021-02-02 15:08:42,949 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4438.84773850441/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/8378593 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:11:05,081 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 870453/8304205 ==
< t:2021-02-02 15:11:05,081 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4580.979810476303/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 870453/8304205 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:13:24,189 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/8900536 ==
< t:2021-02-02 15:13:24,189 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4720.088034391403/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/8900536 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:15:43,068 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/9152973 ==
< t:2021-02-02 15:15:43,069 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4858.967678070068/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/9152973 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:18:15,759 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 878938/9341924 ==
< t:2021-02-02 15:18:15,759 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[5011.658487081528/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 878938/9341924 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:20:35,220 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/9692553 ==
< t:2021-02-02 15:20:35,220 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[5151.119571447372/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/9692553 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:23:05,342 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 853205/9387780 ==
< t:2021-02-02 15:23:05,342 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[5301.241162776947/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 853205/9387780 == has not been published or dumped, maybe you missed .publish()

...

< t:2021-02-02 22:56:12,256 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[32488.155284643173/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1008115/45403397 == has not been published or dumped, maybe you missed .publish()

on the different successful test, it reported almost the same number at the end of writing stress:

< t:2021-01-29 00:28:39,859 f:base.py         l:206  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[32459.151389598846/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 40845578/40845538 == has not been published or dumped, maybe you missed .publish()

i guess we better look on the KCL log for Stream failures after the point of decommission.

yarongilor commented 3 years ago

updated problem scenario is:

create source table: usertable_streams for copying its Streams data to: usertable_streams-dest.
configure KCL on streams of source to target table.
run YCSB insert load.
run decommission node (by nodetool).
add new node
run cleanup on keyspaces ==> results:
getRecords and getShardIterator drops dramatically after node decommission.
after finish adding a new node: KCL is stuck endlessly with DescribeStream ops.
getRecords and getShardIterator stop completely.
KCL log shows endless op of: [ForkJoinPool-1-worker-30] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:9b7bd2e14eb4b787aab386d4d4000a81 kcl-l0-c0-307f6c62-24a2-4c86-a98b-d3c4cdaeefc6.log

KCL log has:

Creating table usertable_streams
Creating table usertable_streams-dest
2021-02-04 15:03:18.626 [main] INFO  StreamsAdapterDemo - Table is active.
2021-02-04 15:03:18.628 [main] INFO  StreamsAdapterDemo - Table is active.
2021-02-04 15:03:18.638 [main] INFO  StreamsAdapterDemo - Creating worker for stream: S1aee8af0-66fa-11eb-b03f-0c9d930cf12c
2021-02-04 15:03:18.646 [main] WARN  c.a.s.k.c.lib.worker.Worker - No configuration received for endpoint and region, will default region to us-east-1
2021-02-04 15:03:18.677 [main] INFO  StreamsAdapterDemo - Starting worker...
2021-02-04 15:03:18.677 [main] INFO  StreamsAdapterDemo - Sleeping for 32400sec
2021-02-04 15:16:34.841 [ForkJoinPool-1-worker-18] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:00671f8de2f6d6a7f8c22a5d7c003191
2021-02-04 15:17:07.376 [ForkJoinPool-1-worker-11] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:c8f0ed125ef6504451c19d4af8001e21
2021-02-04 15:17:35.499 [ForkJoinPool-1-worker-15] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:1c48d7777352dae5f3b04f5ccc003cc1
2021-02-04 15:18:13.027 [ForkJoinPool-1-worker-23] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:9d73d46ec93094f3aab721e3c8000b61
2021-02-04 15:18:42.285 [ForkJoinPool-1-worker-22] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:b7c84c581ef7d5f4bc30429b9c001691
2021-02-04 15:19:12.073 [ForkJoinPool-1-worker-17] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:6f5a53eee91357bd59af546cdc005ab1
2021-02-04 15:19:40.071 [ForkJoinPool-1-worker-30] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:9b7bd2e14eb4b787aab386d4d4000a81

see attached grafana screenshots for these alternator metrics and see also the KCL log file.

fruch commented 3 years ago

one addition, that while seeing that describe stream is being called again and again, we tried it manually to see the output:

aws dynamodbstreams describe-stream --stream S1aee8af0-66fa-11eb-b03f-0c9d930cf12c --endpoint-url http://52.17.134.59:8080

{                                                                                                                                                                                                            
    "StreamDescription": {                                                                                                                                                                                   
        "StreamArn": "S1aee8af0-66fa-11eb-b03f-0c9d930cf12c",                                                                                                                                                
        "StreamStatus": "ENABLED",                                                                                                                                                                           
        "StreamViewType": "NEW_IMAGE",                                                                                                                                                                       
        "TableName": "usertable_streams",                                                                                                                                                                    
        "KeySchema": [                                                                                                                                                                                       
            {                                                                                                                                                                                                
                "AttributeName": "p",                                                                                                                                                                        
                "KeyType": "HASH"                                                                                                                                                                            
            }                                                                                                                                                                                                
        ],                                                                                                                                                                                                   
        "Shards": [                                                                                                                                                                                          
            {                                                                                                                                                                                                
                "ShardId": "H1776d8470d7:003e003ecc0e5ca8b3f18756c0003181",                                                                                                                                  
                "SequenceNumberRange": {                                                                                                                                                                     
                    "StartingSequenceNumber": "2551506219633207326769402383851094144",                                                                                                                       
                    "EndingSequenceNumber": "2551506498337500371376796833832534144"                                                                                                                          
                }
            },
            ... more similar looking shards with the same start and end ....
        ],
        "LastEvaluatedShardId": "H1776d8470d7:01049249249249257aa9593f240031f1"
    }
}

Am guessing (since getShardIterator wasn't called again), that KCL isn't o.k. with those sequence numbers, or they aren't updated, and it can compare it with sequences it already been reading. (Didn't yet looked up the origin of "Dropped lease for shutting down" print, to try to find logic that lead to it)

elcallio commented 3 years ago

@fruch - I would like to see the full dump of the above stream desc. And also continued after last-eval-id

fruch commented 3 years ago

@fruch - I would like to see the full dump of the above stream desc. And also continued after last-eval-id

@elcallio I'm spinning it again, so you can see it live: https://jenkins.scylladb.com/job/scylla-staging/job/Longevity_yaron/job/longevity-alternator-streams-with-nemesis/18/

I'll send details once it's reaching that point, I don't have dumps of the API from those runs.

fruch commented 3 years ago

I thought maybe this issue is related to the fact we weren't using checkpoints, and I've tried it with this change: https://github.com/fruch/hydra-kcl/pull/4

It didn't helped, and we still lost streaming on the first time topology was changed.

@nyh @elcallio, while @yarongilor is working on a reproducer for this on dtest. we need more ideas on how to get to the bottom of this...

elcallio commented 3 years ago

I'd say add verbose logging to the shard iteration inside KCL, i.e. print what you query, result, and where you go from there. It seems you get stuck on either an old generation (why so many? nemesis node kills?) or ids that do not validate.

fruch commented 3 years ago

I'd say add verbose logging to the shard iteration inside KCL, i.e. print what you query, result, and where you go from there. It seems you get stuck on either an old generation (why so many? nemesis node kills?) or ids that do not validate.

@elcallio care to give me an hand in which part you think the logs would be helpful ? before I give it a go ? https://github.com/fruch/hydra-kcl/blob/master/src/main/resources/logback.xml#L25

elcallio commented 3 years ago

Sorry, you will probably need to read the library code to know that. Not in my head.

yarongilor commented 3 years ago

@elcallio , @nyh , @fruch , please advise if the above draft-reproducer PR is in some way helps getting more reproduction info. ( https://github.com/scylladb/scylla-dtest/pull/1977 )

fruch commented 3 years ago

@elcallio After looking again into the docs of dynamodb streams:

EndingSequenceNumber
The ending sequence number for the range. Shards that are in the OPEN state have an ending sequence number of null.

https://docs.aws.amazon.com/kinesis/latest/APIReference/API_SequenceNumberRange.html#Streams-Type-SequenceNumberRange-EndingSequenceNumber

Could be that once we do topology change all EndingSequenceNumber are not null ? i.e. KCL might consider all of the shards closed ?

elcallio commented 3 years ago

The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should. It is worth noting that sequence numbers are considered shard and generation local, i.e. they can appear in more than one shard and generation (they are after all just timestamps in our case). I think this is the case even in original dynamo streams.

fruch commented 3 years ago

The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should. It is worth noting that sequence numbers are considered shard and generation local, i.e. they can appear in more than one shard and generation (they are after all just timestamps in our case). I think this is the case even in original dynamo streams.

That exactly why we are trying to build a reproducer in dtest (or tested in SCT with the KCL to begin with).

@elcallio would it help to doing a cluster again ? to verify this directly ? If so please be in touch with @yarongilor spin it back up as needed

nyh commented 3 years ago

The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should.

I think this can be easily done as a dtest - you create a cluster, list the streams and remember them. Now you add a node to the cluster, and list the streams again. This list should contain all the old streams, plus some new ones. All the new ones should have no EndingSequenceNumber (and I think all the old ones should have a EndingSequenceNumber now).

elcallio commented 3 years ago

It can be done in unit test even simpler. The last (possibly only) generation should always have null end sequences.

nyh commented 3 years ago

@elcallio my point is that an external test - not written in C++ with some secret access to internal information - doesn't know what is "the last gereneration" (right?) - so the sequence of events I described was so that the test can know what is this "last generation" - the new streams that suddenly appear after a node is added. If you have this extra internal information in the C++ unit tests then yes, it can be simpler (but rely on implementation details).

fruch commented 3 years ago

The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should.

I think this can be easily done as a dtest - you create a cluster, list the streams and remember them. Now you add a node to the cluster, and list the streams again. This list should contain all the old streams, plus some new ones. All the new ones should have no EndingSequenceNumber (and I think all the old ones should have a EndingSequenceNumber now).

@nyh I was talking with @yarongilor yesterday exactly on doing that in dtest (and our talk was the reason we reread the docs again).

elcallio commented 3 years ago

No it doesn't. It just verifies we get one set of shards with end==null. No special knowledge - just pure dynamostreams API.

yarongilor commented 3 years ago

The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should.

I think this can be easily done as a dtest - you create a cluster, list the streams and remember them. Now you add a node to the cluster, and list the streams again. This list should contain all the old streams, plus some new ones. All the new ones should have no EndingSequenceNumber (and I think all the old ones should have a EndingSequenceNumber now).

@nyh I was talking with @yarongilor yesterday exactly on doing that in dtest (and our talk was the reason we reread the docs again).

running a small test of it, looks like at any given moment there are 1024 opened shards, where the number of shards is also increasing in 1024 additional shards on topology change. sample test log output is:

2021-02-25 12:25:58.623781  Pre setup - Creating "user_table" table via "node1" node with "{'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}}" stream key
2021-02-25 12:25:58.727871  Creating a new table 'user_table' using node 'node1'..
2021-02-25 12:26:00.950263  The table 'user_table' successfully created..
2021-02-25 12:26:00.953993  Table's schema and configuration are: {'Table': {'AttributeDefinitions': [{'AttributeName': 'hash_key', 'AttributeType': 'S'}], 'TableName': 'user_table', 'KeySchema': [{'AttributeName': 'hash_key', 'KeyType': 'HASH'}], 'TableStatus': 'ACTIVE', 'CreationDateTime': datetime.datetime(2021, 2, 25, 12, 26, tzinfo=tzlocal()), 'TableArn': 'arn:scylla:alternator:alternator_user_table:scylla:table/user_table', 'TableId': 'd9f030c0-7753-11eb-a2e9-49dcf2e9eec8', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST', 'LastUpdateToPayPerRequestDateTime': datetime.datetime(2021, 2, 25, 12, 26, tzinfo=tzlocal())}, 'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}, 'LatestStreamLabel': '"2021-02-25T12:25:58Z"', 'LatestStreamArn': 'Sda02ce60-7753-11eb-a2e9-49dcf2e9eec8'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '634', 'content-type': 'application/json', 'date': 'Thu, 25 Feb 2021 10:26:00 GMT', 'server': 'Seastar httpd'}, 'RetryAttempts': 0}}
2021-02-25 12:26:00.954651  Adding new 400 items to table 'user_table'..
2021-02-25 12:26:01.271578  trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-25 12:26:01.291290  Waiting until stream of "user_table" table be active
2021-02-25 12:26:01.291470  trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-25 12:26:01.354776  Start Alternator stress of put_table_items..
 Using parameters of: {'table_name': 'user_table', 'node': <ccmlib.scylla_node.ScyllaNode object at 0x7fc6fa281d60>, 'num_of_items': 100, 'verbose': False}
2021-02-25 12:26:01.356105  Running put_table_items...
2021-02-25 12:26:01.366810  Start decommission thread of _run_decommission_add_node_loop..
2021-02-25 12:26:01.375397  Running _run_decommission_add_node_loop...
2021-02-25 12:26:01.375629  Decommissioning node4..
2021-02-25 12:26:01.968918  Shards updated status is:
2021-02-25 12:26:01.969040  Existing number of shards [0] is updated to: [1024]
2021-02-25 12:26:01.970111  Existing number of open shards [0] is updated to: [1024]
2021-02-25 12:26:03.090101  Shards updated status is:
2021-02-25 12:26:03.090330  Existing number of shards [1024] is updated to: [1024]
2021-02-25 12:26:03.094255  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:26:18.655721  Shards updated status is:
2021-02-25 12:26:18.655846  Existing number of shards [1024] is updated to: [1024]
2021-02-25 12:26:18.657774  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:26:24.153712  Adding new node to cluster..
2021-02-25 12:26:34.349791  Shards updated status is:
2021-02-25 12:26:34.349935  Existing number of shards [1024] is updated to: [1024]
2021-02-25 12:26:34.352423  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:26:49.906273  Shards updated status is:
2021-02-25 12:26:49.906378  Existing number of shards [1024] is updated to: [1024]
2021-02-25 12:26:49.908600  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:27:05.699200  Shards updated status is:
2021-02-25 12:27:05.699389  Existing number of shards [1024] is updated to: [2048]
2021-02-25 12:27:05.703279  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:27:21.920970  Shards updated status is:
2021-02-25 12:27:21.921084  Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:27:21.927367  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:27:28.666618  Node successfully added!
2021-02-25 12:27:37.658836  Shards updated status is:
2021-02-25 12:27:37.658948  Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:27:37.662927  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:27:43.681446  Decommissioning node5..
2021-02-25 12:27:53.811384  Shards updated status is:
2021-02-25 12:27:53.811537  Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:27:53.826324  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:28:09.144546  Adding new node to cluster..
2021-02-25 12:28:10.038065  Shards updated status is:
2021-02-25 12:28:10.038279  Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:28:10.043298  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:28:26.074693  Shards updated status is:
2021-02-25 12:28:26.074900  Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:28:26.080218  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:28:43.057119  Shards updated status is:
2021-02-25 12:28:43.058037  Existing number of shards [2048] is updated to: [3072]
2021-02-25 12:28:43.064048  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:28:59.796794  Shards updated status is:
2021-02-25 12:28:59.799781  Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:28:59.845306  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:29:16.598609  Node successfully added!
2021-02-25 12:29:16.630708  Shards updated status is:
2021-02-25 12:29:16.630850  Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:29:16.640763  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:29:31.618500  Decommissioning node6..
2021-02-25 12:29:33.914756  Shards updated status is:
2021-02-25 12:29:33.914918  Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:29:33.924050  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:29:51.056645  Shards updated status is:
2021-02-25 12:29:51.056794  Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:29:51.063571  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:30:00.724011  Adding new node to cluster..
2021-02-25 12:30:07.547733  Shards updated status is:
2021-02-25 12:30:07.547888  Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:30:07.556992  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:30:23.881786  Shards updated status is:
2021-02-25 12:30:23.882078  Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:30:23.889359  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:30:41.056615  Shards updated status is:
2021-02-25 12:30:41.056789  Existing number of shards [3072] is updated to: [4096]
2021-02-25 12:30:41.070664  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:30:58.897867  Shards updated status is:
2021-02-25 12:30:58.898102  Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:30:58.922520  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:31:09.595575  Node successfully added!
2021-02-25 12:31:15.676255  Shards updated status is:
2021-02-25 12:31:15.676392  Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:31:15.687763  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:31:24.610455  Decommissioning node7..
2021-02-25 12:31:33.051741  Shards updated status is:
2021-02-25 12:31:33.051846  Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:31:33.060408  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:31:53.692776  Shards updated status is:
2021-02-25 12:31:53.692905  Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:31:53.705916  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:32:01.604710  Adding new node to cluster..
2021-02-25 12:32:10.162478  Shards updated status is:
2021-02-25 12:32:10.162786  Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:32:10.171605  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:32:27.811088  Shards updated status is:
2021-02-25 12:32:27.811302  Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:32:27.819995  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:32:45.537272  Shards updated status is:
2021-02-25 12:32:45.537446  Existing number of shards [4096] is updated to: [5120]
2021-02-25 12:32:45.550380  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:33:06.497300  Shards updated status is:
2021-02-25 12:33:06.502508  Existing number of shards [5120] is updated to: [5120]
2021-02-25 12:33:06.521749  Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:33:12.097748  Node successfully added!

yarongilor commented 3 years ago

A possible reproduction is encountered at a revised updated test. @elcallio , @fruch , @nyh , please advise. The reproduction test scenario is:

create a table for Streams. run alternator write stress thread in background. run another background thread that decommission a node, then adds a new node. run a main loop that call Streams APIs , checking for existing open-shards and open-shards-records diff (to verify a change from previous cycle). the test search for records of open-shards only. it gets its shard-iterators using TRIM_HORIZON type, then query for their records. the first few cycles resulted with records content, then, after several cycles, all the shard iterators did not return a response with records, thus an assertion for existing records failed the test.

test can be found in: https://github.com/yarongilor/scylla-dtest/blob/b2e9d608d70910ed31bfc7da8c6d2f1441e332c7/alternator_stream_tests.py#L97

test output:

2021-02-28 22:07:45.470138  - Starting Scylla cluster version master_2021-02-15T11_11_32Z
2021-02-28 22:07:45.473380  - Allocated cluster ID 44: /home/yarongilor/.dtest/dtest-zvtxwjw5
2021-02-28 22:07:45.532513  - Scylla mode is 'release'
2021-02-28 22:07:45.532611  - Cluster *_request_timeout_in_ms=10000, range_request_timeout_in_ms=30000, cql request_timeout=30
2021-02-28 22:07:45.540136  - Starting node 1
2021-02-28 22:08:12.301486  - Starting node 2
2021-02-28 22:09:14.807171  - Starting node 3
2021-02-28 22:10:17.755874  - Starting node 4
2021-02-28 22:11:21.236767  - Pre setup - Creating "user_table" table via "node1" node with "{'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}}" stream key
2021-02-28 22:11:28.528368  - Creating a new table 'user_table' using node 'node1'..
2021-02-28 22:11:30.141509  - The table 'user_table' successfully created..
2021-02-28 22:11:30.144094  - Table's schema and configuration are: {'Table': {'AttributeDefinitions': [{'AttributeName': 'hash_key', 'AttributeType': 'S'}], 'TableName': 'user_table', 'KeySchema': [{'AttributeName': 'hash_key', 'KeyType': 'HASH'}], 'TableStatus': 'ACTIVE', 'CreationDateTime': datetime.datetime(2021, 2, 28, 22, 11, 30, tzinfo=tzlocal()), 'TableArn': 'arn:scylla:alternator:alternator_user_table:scylla:table/user_table', 'TableId': '242e4460-7a01-11eb-8ad0-a0dea5f52420', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST', 'LastUpdateToPayPerRequestDateTime': datetime.datetime(2021, 2, 28, 22, 11, 30, tzinfo=tzlocal())}, 'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}, 'LatestStreamLabel': '"2021-02-28T22:11:28Z"', 'LatestStreamArn': 'S2430dc70-7a01-11eb-8ad0-a0dea5f52420'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '634', 'content-type': 'application/json', 'date': 'Sun, 28 Feb 2021 20:11:29 GMT', 'server': 'Seastar httpd'}, 'RetryAttempts': 0}}
2021-02-28 22:11:30.144298  - Adding new 100 items to table 'user_table'..
2021-02-28 22:11:30.246841  - trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-28 22:11:30.260607  - Start Alternator stress of put_table_items..
 Using parameters of: {'table_name': 'user_table', 'node': <ccmlib.scylla_node.ScyllaNode object at 0x7f8ce8563b50>, 'num_of_items': 1000, 'verbose': False}
2021-02-28 22:11:30.260865  - Running put_table_items...
2021-02-28 22:11:30.265326  - Start decommission thread of run_decommission_then_add_node..
2021-02-28 22:11:30.265589  - Running run_decommission_then_add_node...
2021-02-28 22:11:30.265672  - Decommissioning node4..
2021-02-28 22:11:30.851040  - Shards updated status is:
2021-02-28 22:11:30.851150  - Existing number of shards [0] is updated to: [1024]
2021-02-28 22:11:30.851685  - Existing number of open shards [0] is updated to: [1024]
2021-02-28 22:11:31.531979  - Shards updated status is:
2021-02-28 22:11:31.532030  - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:11:31.532526  - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:11:34.274969  - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:11:52.845465  - Adding new node to cluster..
2021-02-28 22:11:56.441125  - Found 15212 records for 1024 shard iterators.
2021-02-28 22:11:56.989906  - Records diff is:
2021-02-28 22:11:56.990005  - iterable_item_added: 996
2021-02-28 22:12:29.731076  - Shards updated status is:
2021-02-28 22:12:29.731142  - Existing number of shards [1024] is updated to: [2048]
2021-02-28 22:12:29.731875  - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:12:32.507168  - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:12:55.672117  - Found 0 records for 1024 shard iterators.
2021-02-28 22:12:55.735712  - Test failed with errors: [(<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'AssertionError'>, AssertionError('No open shards records found'), <traceback object at 0x7f8cc66051c0>))]

As a comparison validation check - the issue is not reproduced when running the same test without the background decommission thread:

2021-02-28 22:38:34.657438 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 1
2021-02-28 22:39:01.084446 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 2
2021-02-28 22:40:03.062533 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 3
2021-02-28 22:41:05.445600 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 4
2021-02-28 22:42:07.675809 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Pre setup - Creating "user_table" table via "node1" node with "{'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}}" stream key
2021-02-28 22:42:07.873418 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Creating a new table 'user_table' using node 'node1'..
2021-02-28 22:42:09.467958 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - The table 'user_table' successfully created..
2021-02-28 22:42:09.471575 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Table's schema and configuration are: {'Table': {'AttributeDefinitions': [{'AttributeName': 'hash_key', 'AttributeType': 'S'}], 'TableName': 'user_table', 'KeySchema': [{'AttributeName': 'hash_key', 'KeyType': 'HASH'}], 'TableStatus': 'ACTIVE', 'CreationDateTime': datetime.datetime(2021, 2, 28, 22, 42, 9, tzinfo=tzlocal()), 'TableArn': 'arn:scylla:alternator:alternator_user_table:scylla:table/user_table', 'TableId': '6c814b50-7a05-11eb-96a3-b9b283df594d', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST', 'LastUpdateToPayPerRequestDateTime': datetime.datetime(2021, 2, 28, 22, 42, 9, tzinfo=tzlocal())}, 'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}, 'LatestStreamLabel': '"2021-02-28T22:42:07Z"', 'LatestStreamArn': 'S6c834720-7a05-11eb-96a3-b9b283df594d'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '634', 'content-type': 'application/json', 'date': 'Sun, 28 Feb 2021 20:42:09 GMT', 'server': 'Seastar httpd'}, 'RetryAttempts': 0}}
2021-02-28 22:42:09.471882 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new 100 items to table 'user_table'..
2021-02-28 22:42:09.521999 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-28 22:42:09.537825 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Start Alternator stress of put_table_items..
 Using parameters of: {'table_name': 'user_table', 'node': <ccmlib.scylla_node.ScyllaNode object at 0x7f230203fb50>, 'num_of_items': 1000, 'verbose': False}
2021-02-28 22:42:09.538091 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Running put_table_items...
2021-02-28 22:42:10.027383 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:42:10.027471 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [0] is updated to: [1024]
2021-02-28 22:42:10.027745 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [0] is updated to: [1024]
2021-02-28 22:42:10.655643 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:42:10.655712 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:42:10.656377 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:42:13.513601 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:42:35.614614 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 26032 records for 1024 shard iterators.
2021-02-28 22:42:36.288177 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:42:36.288290 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 1000
2021-02-28 22:43:07.416283 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:43:07.416910 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:43:07.419720 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:43:10.234157 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:43:44.411341 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 39040 records for 1024 shard iterators.
2021-02-28 22:43:46.314524 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:43:46.314624 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 142
2021-02-28 22:44:16.667640 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:44:16.667731 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:44:16.668326 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:44:19.554951 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:44:52.128172 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 25090 records for 1024 shard iterators.
2021-02-28 22:44:53.557562 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:44:53.557629 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 126
2021-02-28 22:44:53.557663 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 2
2021-02-28 22:45:23.951322 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:45:23.951387 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:45:23.952206 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:45:26.932139 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:46:16.035860 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 52832 records for 1024 shard iterators.
2021-02-28 22:46:18.370925 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:46:18.370995 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 17
2021-02-28 22:46:18.371032 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 67
2021-02-28 22:46:48.794302 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:46:48.794361 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:46:48.795071 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:46:51.822724 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:47:59.030950 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 56824 records for 1024 shard iterators.
2021-02-28 22:48:01.879236 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:48:01.879312 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 68
2021-02-28 22:48:01.882410 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 3
2021-02-28 22:48:32.377957 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:48:32.378040 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:48:32.378848 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:48:35.215964 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:49:14.534186 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 31260 records for 1024 shard iterators.
2021-02-28 22:49:16.710381 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:49:16.710461 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 3
2021-02-28 22:49:16.710504 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 45
2021-02-28 22:49:47.184573 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:49:47.185068 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:49:47.185831 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:49:49.818557 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:50:34.829131 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 37663 records for 1024 shard iterators.
2021-02-28 22:50:36.677670 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:50:36.681053 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 45
2021-02-28 22:51:07.139664 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:51:07.139717 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:51:07.140604 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:51:10.049777 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:52:04.786331 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 53015 records for 1024 shard iterators.
2021-02-28 22:52:07.038115 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:52:07.038958 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 45
2021-02-28 22:52:37.534385 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:52:37.534484 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:52:37.535195 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:52:40.227533 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:53:24.571288 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 45339 records for 1024 shard iterators.
2021-02-28 22:53:27.635824 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:53:27.635894 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 45
2021-02-28 22:53:58.093358 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:53:58.093408 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:53:58.094448 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:54:01.123506 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:54:56.846935 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 68625 records for 1024 shard iterators.
2021-02-28 22:54:59.733642 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:54:59.733701 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 3
2021-02-28 22:55:29.826478 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Stopping cluster
2021-02-28 22:56:24.757699 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - stopping ccm cluster test at: /home/yarongilor/.dtest/dtest-2vfds6ri
2021-02-28 22:56:24.829742 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Freeing cluster ID 39: link /home/yarongilor/.dtest/39

yarongilor commented 3 years ago

on another run got a read timeout exception of Streams trying to read CDC table:

2021-02-28 23:35:18.981871 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - cluster ccm directory: /home/yarongilor/.dtest/dtest-3sy6s1g9
2021-02-28 23:35:18.981973 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting Scylla cluster version master_2021-02-15T11_11_32Z
test_updated_shards_during_add_decommission_node (alternator_stream_tests.AlternatorStreamsTest) ... 2021-02-28 23:35:19.008124 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Allocated cluster ID 34: /home/yarongilor/.dtest/dtest-3sy6s1g9
2021-02-28 23:35:19.089650 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Scylla mode is 'release'
2021-02-28 23:35:19.089737 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Cluster *_request_timeout_in_ms=10000, range_request_timeout_in_ms=30000, cql request_timeout=30
2021-02-28 23:35:19.112288 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 1
2021-02-28 23:35:55.552312 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 2
2021-02-28 23:36:57.787020 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 3
2021-02-28 23:38:00.381024 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 4
2021-02-28 23:39:02.954116 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Pre setup - Creating "user_table" table via "node1" node with "{'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}}" stream key
2021-02-28 23:39:03.035801 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Creating a new table 'user_table' using node 'node1'..
2021-02-28 23:39:05.855278 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - The table 'user_table' successfully created..
2021-02-28 23:39:05.858662 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Table's schema and configuration are: {'Table': {'AttributeDefinitions': [{'AttributeName': 'hash_key', 'AttributeType': 'S'}], 'TableName': 'user_table', 'KeySchema': [{'AttributeName': 'hash_key', 'KeyType': 'HASH'}], 'TableStatus': 'ACTIVE', 'CreationDateTime': datetime.datetime(2021, 2, 28, 23, 39, 5, tzinfo=tzlocal()), 'TableArn': 'arn:scylla:alternator:alternator_user_table:scylla:table/user_table', 'TableId': '6019d000-7a0d-11eb-89a3-c38ffacd3ea6', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST', 'LastUpdateToPayPerRequestDateTime': datetime.datetime(2021, 2, 28, 23, 39, 5, tzinfo=tzlocal())}, 'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}, 'LatestStreamLabel': '"2021-02-28T23:39:03Z"', 'LatestStreamArn': 'S603c7330-7a0d-11eb-89a3-c38ffacd3ea6'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '634', 'content-type': 'application/json', 'date': 'Sun, 28 Feb 2021 21:39:05 GMT', 'server': 'Seastar httpd'}, 'RetryAttempts': 0}}
2021-02-28 23:39:05.859767 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new 100 items to table 'user_table'..
2021-02-28 23:39:05.947565 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-28 23:39:05.965397 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Start Alternator stress of put_table_items..
 Using parameters of: {'table_name': 'user_table', 'node': <ccmlib.scylla_node.ScyllaNode object at 0x7f58c7ed5d30>, 'num_of_items': 1000, 'verbose': False}
2021-02-28 23:39:05.966150 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Running put_table_items...
2021-02-28 23:39:05.979404 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Start decommission thread of run_decommission_then_add_node..
2021-02-28 23:39:05.979953 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Running run_decommission_then_add_node...
2021-02-28 23:39:05.980104 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node4..
2021-02-28 23:39:06.486024 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:39:06.486388 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [0] is updated to: [1024]
2021-02-28 23:39:06.487496 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [0] is updated to: [1024]
2021-02-28 23:39:06.487623 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #1..
2021-02-28 23:39:07.150922 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:39:07.151082 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 23:39:07.153275 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:39:12.061712 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:39:27.970388 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:39:38.743387 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 19384 records for 2048 shard iterators.
2021-02-28 23:39:40.547916 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:39:40.548158 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 1000
2021-02-28 23:40:10.577976 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #2..
2021-02-28 23:40:11.678732 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:40:11.679013 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [2048]
2021-02-28 23:40:11.693952 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:40:22.260551 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:40:33.028726 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:40:38.040240 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node5..
2021-02-28 23:40:44.368490 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 4882 records for 3072 shard iterators.
2021-02-28 23:40:46.421692 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:40:46.421884 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 139
2021-02-28 23:41:06.153941 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:41:16.451749 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #3..
2021-02-28 23:41:17.350494 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:41:17.350691 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [2048] is updated to: [2048]
2021-02-28 23:41:17.356670 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:41:22.445334 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:42:12.347512 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:42:17.361723 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node6..
2021-02-28 23:42:25.899223 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 49106 records for 2048 shard iterators.
2021-02-28 23:42:30.233951 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:42:30.234143 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 139
2021-02-28 23:42:43.404626 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:43:00.268331 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #4..
2021-02-28 23:43:04.654017 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:43:04.654134 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [2048] is updated to: [3072]
2021-02-28 23:43:04.659551 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:43:09.391361 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:44:01.182274 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:44:06.205544 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node7..
2021-02-28 23:44:13.097143 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 37920 records for 2048 shard iterators.
2021-02-28 23:44:19.066895 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:44:19.067013 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 3
2021-02-28 23:44:41.554486 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:44:49.083946 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #5..
2021-02-28 23:44:57.273254 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:44:57.273380 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [3072] is updated to: [4096]
2021-02-28 23:44:57.283666 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:45:02.187139 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:45:51.297199 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:45:56.302194 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node8..
2021-02-28 23:46:23.336680 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 40546 records for 2048 shard iterators.
2021-02-28 23:46:26.214606 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:46:29.565062 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:46:29.565188 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 3
2021-02-28 23:46:59.594960 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #6..
2021-02-28 23:47:01.887296 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:47:01.887970 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [4096] is updated to: [6144]
2021-02-28 23:47:01.898940 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:47:06.377222 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:47:46.237578 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 6 records for 3072 shard iterators.
2021-02-28 23:47:50.533481 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:47:50.533633 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 997
2021-02-28 23:47:55.558881 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:48:00.568826 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node9..
2021-02-28 23:48:20.563822 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #7..
2021-02-28 23:48:52.337358 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:48:52.337489 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [6144] is updated to: [6144]
2021-02-28 23:48:52.350800 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:48:57.148758 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:48:59.669771 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:50:19.479974 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:50:24.484702 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node9..
2021-02-28 23:50:59.958509 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 32596 records for 2048 shard iterators.
2021-02-28 23:51:01.659345 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:51:01.659591 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 997
2021-02-28 23:51:31.689832 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #8..
2021-02-28 23:51:36.203591 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:51:36.203970 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [6144] is updated to: [7168]
2021-02-28 23:51:36.238262 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:51:38.935224 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:52:44.332810 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Test failed with errors: [(<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'botocore.errorfactory.InternalServerError'>, InternalServerError('An error occurred (InternalServerError) when calling the GetRecords operation (reached max retries: 4): Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_user_table.user_table_scylla_cdc_log - received only 1 responses from 2 CL=LOCAL_QUORUM.)'), <traceback object at 0x7f58920b1140>))]
2021-02-28 23:52:44.333235 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Stopping cluster
2021-03-01 00:00:16.769839 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Test failed with errors: [(<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'botocore.errorfactory.InternalServerError'>, InternalServerError('An error occurred (InternalServerError) when calling the GetRecords operation (reached max retries: 4): Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_user_table.user_table_scylla_cdc_log - received only 1 responses from 2 CL=LOCAL_QUORUM.)'), <traceback object at 0x7f58920b1140>)), (<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'ccmlib.node.NodeError'>, NodeError('Problem stopping node node1'), <traceback object at 0x7f5893ebf980>)), (<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'ccmlib.node.NodetoolError'>, NodetoolError("Nodetool command '/home/yarongilor/.ccm/scylla-repository/master_2021-02-15T11_11_32Z/scylla-tools-java/bin/nodetool -h 127.0.34.9 -p 7934 decommission' failed; exit status: 1; stdout: nodetool: Scylla API server HTTP POST to URL '/storage_service/decommission' failed: std::runtime_error (local node is not a member of the token ring yet)\nSee 'nodetool help' or 'nodetool help <command>'.\n"), <traceback object at 0x7f5893ebff40>)), (<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'botocore.exceptions.EndpointConnectionError'>, EndpointConnectionError('Could not connect to the endpoint URL: "http://127.0.34.1:8080/"'), <traceback object at 0x7f5893ebf3c0>))]
2021-03-01 00:00:16.770248 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Stopping cluster

elcallio commented 3 years ago

Why would you a.) be sure to even get data in the new, open shards? This is the new CDC generation, and could quite possibly be wholly empty at this point in time, depending on cluster propagation, and the rate of data insertion in base table. b.) Assume that data from previous read of open shards would be equal to reading from a new set? First of, if data is being added to base table, you would get new records. Second, if we change topology, we get a new set of open shards, whose data is only as old as the topology change.

Remember: topology changes cause CDC stream id (shards in alternator) to be re-created, new. But all old data (created before the change) will remain with their old shards. Nothing moves. It is just that new data will go into the new partitioning.

yarongilor commented 3 years ago

Why would you a.) be sure to even get data in the new, open shards? This is the new CDC generation, and could quite possibly be wholly empty at this point in time, depending on cluster propagation, and the rate of data insertion in base table. b.) Assume that data from previous read of open shards would be equal to reading from a new set? First of, if data is being added to base table, you would get new records. Second, if we change topology, we get a new set of open shards, whose data is only as old as the topology change.

Remember: topology changes cause CDC stream id (shards in alternator) to be re-created, new. But all old data (created before the change) will remain with their old shards. Nothing moves. It is just that new data will go into the new partitioning.

@elcallio , a) because a background stress thread inserts data continuously. and new data is assumed to only be found in open shards. would it be possible that all open shards have no records at all (although writes are inserted all the time). b) on the contrary - the assumption is data from previous read is different. and again, it is not asserted, only printed to log. the expected result is getting new different records after topology change. so that it confirms Streams are appropriately handling topology changes and not getting stuck somehow.

@fruch , i think we have to consider integrating KCL to Dtest for these reproduction efforts. since current reproduced issues in Dtest are a bit different than the original KCL one.

elcallio commented 3 years ago

It is absolutely possible to have empty (no data in it) shard set. Esp. since there is an overlap window where data writes can go to either old shards or new shards. First of, I can't see anything in the test verifying that a shard change has actually happened between diff:ing prev and current. You could very well be looking at the same set of shards. It is correct that if the shard set is different, the data is different, but the check does not include the shards themselves, only the records.

yarongilor commented 3 years ago

rerunning the SCT test, looks like there are 100 shards were all are closed and there's no open shard:

yarongilor@yaron-pc:/$ aws dynamodbstreams describe-stream --stream S0a16ddd0-7b46-11eb-af34-e4a0ee242883 --endpoint-url http://34.243.174.71:8080 | grep '"ShardId":' -c
100
yarongilor@yaron-pc:/$ aws dynamodbstreams describe-stream --stream S0a16ddd0-7b46-11eb-af34-e4a0ee242883 --endpoint-url http://34.243.174.71:8080 | grep EndingSequenceNumber -c
100
yarongilor@yaron-pc:/$ aws dynamodbstreams describe-stream --stream S0a16ddd0-7b46-11eb-af34-e4a0ee242883 --endpoint-url http://34.243.174.71:8080 | grep StartingSequenceNumber -c
100

@elcallio , @nyh , @fruch , the cluster is alive and can be accesses for any debugging you think of. describe_stream_S0a16ddd0-7b46-11eb-af34-e4a0ee242883.txt

elcallio commented 3 years ago

You can't do just "describe-stream", since it will give back max 100 entries (look above!). Like all dynamo operations, it is paged and you need to repeat the query with ExclusiveStartShardId set to whatever the last query gave back. It is tedious as f***, which is why you typically script it.

yarongilor commented 3 years ago

You can't do just "describe-stream", since it will give back max 100 entries (look above!). Like all dynamo operations, it is paged and you need to repeat the query with ExclusiveStartShardId set to whatever the last query gave back. It is tedious as f***, which is why you typically script it.

right, i was just about writing a small script for it.. and indeed it proves there are some open shards. they are found eventually after going over more than 200,000 shards:

Found 100000 StartingSequenceNumber and 100000 EndingSequenceNumber
Found 100000 StartingSequenceNumber and 100000 EndingSequenceNumber
Found 25792 StartingSequenceNumber and 4288 EndingSequenceNumber
Open shards found!

the monitor can be seen here: http://54.195.131.244:3000/d/Irdtk9sGk/longevity-alternator-streams-with-nemesis-scylla-per-server-metrics-nemesis-master?orgId=1&refresh=30s

alternator_stream_tests_dbg.txt

yarongilor commented 3 years ago

Following that, it also looks like the original problem is not reproduced with the latest master version tested here. the grafana alternator metrics do show API calls for shard iterators and records. but on the other hand, from some reason the destination table is reported to be zero size all along the run by nodetool cfstats:

20:33:18  < t:2021-03-02 18:33:13,707 f:kcl_thread.py   l:139  c:sdcm.kcl_thread      p:INFO  > == CompareTablesSizesThread: dst table/src table number of partitions: 0/37727884 ==

Screenshot from 2021-03-02 20-33-16

fruch commented 3 years ago

@yarongilor according to the hydra kcl log, seem like it's still happening (even that the dashboard isn't showing exactly the thing we've seen, and in this case getShardIterator is being called again and again, but you can see that getRecords isn't)

2021-03-02 10:57:13.915 [main] INFO  StreamsAdapterDemo - Sleeping for 32400sec
2021-03-02 10:57:34.685 [ForkJoinPool-1-worker-14] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H177f27a7273:af10000000000000ca731e6f78000271
2021-03-02 10:57:34.827 [ForkJoinPool-1-worker-29] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H177f27a7273:1ba8000000000000e863de061c000931
2021-03-02 10:57:34.969 [ForkJoinPool-1-worker-30] WARN  c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H177f27a7273:88d5b6db6db6db6e94dd675fb4000081

I think in this case the decommission took place straight as the KCL started, and it failed to copy anything into the target table.

Also one of you loader is killed by spot termination, so I don't know if we have more infro from this cluster, and you terminate it.

elcallio commented 3 years ago

That log is a bit weird. It suggests that the lease for a shard is being dropped even though it is not "closed", i.e. we've reached end-of-data (GetRecords returned no next iterator) but the mechanism cannot see the shard as "closed". My guess is that this is because not all obsolete shards (in alternator) are parents - and this is the criteria kinesis uses, not the more appropriate "EndSequenceNumber". I am unsure if this could be an issue. Since we're end-of-data anyway it should not, but...

elcallio commented 3 years ago

The worrying part is that this will cause kinesis to not update the stream position for the shard. Which again, should not affect anything, but...

elcallio commented 3 years ago

Update: I think I've managed to reproduce at least one broken behaviour. Seems there is an issue with (new) stream ids, timestamps and our bookkeeping of whether we've reached shard end or not. Investigating/trying to fix. Basically, our test condition seems to fail, and we end up saying open shards are actually ended.

yarongilor commented 3 years ago

KCL table dump to file after Stream got stuck.

cqlsh:alternator_streams-adapter-demo> SELECT count(*) from "streams-adapter-demo" ;

 count
-------
 43008

alternator_streams-adapter-demo.streams-adapter-demo.log.gz alternator_streams-adapter-demo.streams-adapter-demo.txt

Screenshot from 2021-03-09 20-56-19

fruch commented 3 years ago

KCL table dump to file after Stream got stuck.
cqlsh:alternator_streams-adapter-demo> SELECT count(*) from "streams-adapter-demo" ;

 count
-------
 43008
alternator_streams-adapter-demo.streams-adapter-demo.log.gz alternator_streams-adapter-demo.streams-adapter-demo.txt

This is after one decommission and adding one node ? This number correlate to the numbers of shared, this is quite a lot.

@slivne had a suggestion today to reproduce it with one vnode per node, and smaller machine to try it out with much less shards. to try figuring if with fewer shards it work

fruch commented 3 years ago

Here are my notes of trying to reproduce it manually (it depends on have YCSB / hydra-kcl and latest scylla-ccm) and on each step pushing bigger and bigger amount of keys with YCSB

# omitting --vnodes here, would make it work with num_tokens=1
./ccm create --scylla --vnodes --version branch_4.4.rc1 -n 3 8012_repo
./ccm updateconf alternator_write_isolation:always_use_lwt alternator_port:8080 experimental:true
./ccm start

# wait until cluster up

# in run hydra-kcl
./gradlew run --args='-e http://127.0.0.1:8080 -t usertable --timeout 32400'

# run YCSB
python2.7 ./bin/ycsb load dynamodb  -P workloads/workloada -P dynamodb.properties -p recordcount=100 -p insertorder=uniform -p insertcount=100 -p fieldcount=2 -p fieldlength=50 -s

# decomission
./ccm node3 nodetool decommission

# push bigger data set
python2.7 ./bin/ycsb load dynamodb  -P workloads/workloada -P dynamodb.properties -p recordcount=200 -p insertorder=uniform -p insertcount=200 -p fieldcount=2 -p fieldlength=50 -s

./ccm add -scylla -b node4
./ccm node4 start

# push bigger data set
python2.7 ./bin/ycsb load dynamodb  -P workloads/workloada -P dynamodb.properties -p recordcount=300 -p insertorder=uniform -p insertcount=300 -p fieldcount=2 -p fieldlength=50 -s

When I'll have more time to sit down on it, I might be able to pack it all into a dtest, but we'll followed up on @yarongilor reproduction run with more debug and with ShardSyncStrategyType.PERIODIC (as suggested here https://github.com/fruch/hydra-kcl/pull/5)

yarongilor commented 3 years ago

issue is reproduced where running a single decommission only + using a smaller db instance type. thus there are less SMPs and less Stream shards.

Installation details Kernel version: 5.11.6-1.el7.elrepo.x86_64 Scylla version (or git commit hash): 4.5.dev-0.20210319.abab1d906c with build-id c60f1de6f6a94a2b9a6219edd82727f66aaf8c83 Cluster size: 6 nodes (i3.large) OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0c8a8490cf80a53a0 (aws: eu-west-1)

Test: longevity-alternator-streams-with-nemesis Test name: longevity_test.LongevityTest.test_custom_time Test config file(s):

Issue description

====================================

PUT ISSUE DESCRIPTION HERE

====================================

Restore Monitor Stack command: $ hydra investigate show-monitor a7453f7c-eddb-4777-9cfa-c7931bbcc2f5 Show all stored logs command: $ hydra investigate show-logs a7453f7c-eddb-4777-9cfa-c7931bbcc2f5

Test id: a7453f7c-eddb-4777-9cfa-c7931bbcc2f5

Logs: grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-alternator-20210321_102430-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-longevity-alternator-streams-with-nemesis-scylla-per-server-metrics-nemesis-20210321_102155-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-overview-20210321_101815-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/db-cluster-a7453f7c.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/loader-set-a7453f7c.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/monitor-set-a7453f7c.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/sct-runner-a7453f7c.zip

Jenkins job URL

yarongilor commented 3 years ago

Screenshot from 2021-03-24 22-14-55 Screenshot from 2021-03-24 19-30-14