Closed yarongilor closed 1 year ago
issue is reproduced where running a single decommission only + using a smaller db instance type. thus there are less SMPs and less Stream shards.
Installation details Kernel version:
5.11.6-1.el7.elrepo.x86_64
Scylla version (or git commit hash):4.5.dev-0.20210319.abab1d906c with build-id c60f1de6f6a94a2b9a6219edd82727f66aaf8c83
Cluster size: 6 nodes (i3.large) OS (RHEL/CentOS/Ubuntu/AWS AMI):ami-0c8a8490cf80a53a0
(aws: eu-west-1)Test:
longevity-alternator-streams-with-nemesis
Test name:longevity_test.LongevityTest.test_custom_time
Test config file(s):Issue description
====================================
PUT ISSUE DESCRIPTION HERE
====================================
Restore Monitor Stack command:
$ hydra investigate show-monitor a7453f7c-eddb-4777-9cfa-c7931bbcc2f5
Show all stored logs command:$ hydra investigate show-logs a7453f7c-eddb-4777-9cfa-c7931bbcc2f5
Test id:
a7453f7c-eddb-4777-9cfa-c7931bbcc2f5
Logs: grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-alternator-20210321_102430-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-longevity-alternator-streams-with-nemesis-scylla-per-server-metrics-nemesis-20210321_102155-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-overview-20210321_101815-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/db-cluster-a7453f7c.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/loader-set-a7453f7c.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/monitor-set-a7453f7c.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/sct-runner-a7453f7c.zip
@yarongilor can you share the logs from hydra-kcl ? i.e. so we see what it's stuck on, and confirm the suspect issue is what we are seeing ?
Did you reproduce this with a smaller number of vnodes ?
Did you reproduce this with a smaller number of vnodes ?
it was reproduced where db instance type changed from 'i3.4xlarge' to 'i3.large'.
need to rerun in order to get the new kcl debug logs, so if this setting is good enough, i'll rerun again.
Please run with less vnodes it will still be hard
The parameter to change is num_tokens - that by default is set to 256 - please change it to be 16 and lets see what you get.
On Mon, Apr 5, 2021 at 10:30 AM yarongilor @.***> wrote:
Did you reproduce this with a smaller number of vnodes ?
it was reproduced where db instance type changed from 'i3.4xlarge' to 'i3.large'.
need to rerun in order to get the new kcl debug logs, so if this setting is good enough, i'll rerun again.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/8012#issuecomment-813245348, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2OCCAPBPV6654R5FWBMM3THFRH7ANCNFSM4W5H5G7A .
@yarongilor while reproducing on a smaller node is very good (it will be cheaper to run the test, and to live live nodes for debugging), @slivne's question was on vnodes, i.e., the number of tokens per node. This is the "num_tokens
" configuration parameter of Scylla. Its default is (IIRC) 256, and can be set to much lower (even 1).
One of the theories was that there is a huge number of new "shards" (in the DynamoDB Streams sense - i.e., partitions in the CDC log) and that the KCL is taking a long time to process them instead of doing useful work (so it's a sort of livelock - KCL is working but not making any progress). The number of "shards" (CDC partitions) is proportional to the number of vnodes (num_tokens
) so reducing it from 256 to (say) 1 can lower this amount of work, and maybe we'll see KCL making progress then.
Test reran using a minimum of 1 num_tokens and KCL extended debug configuration. from some reason it looks like no data is transferred to the target table. KCL log: https://drive.google.com/file/d/1RBaWDSyNIImpGfRRGSBXZsPFxjg4UEF8/view?usp=sharing test details:
Restore Monitor Stack command: $ hydra investigate show-monitor 1ad97d55-5bd1-4217-860f-878a8860ae40
Show all stored logs command: $ hydra investigate show-logs 1ad97d55-5bd1-4217-860f-878a8860ae40
Test id: 1ad97d55-5bd1-4217-860f-878a8860ae40
Logs: db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/1ad97d55-5bd1-4217-860f-878a8860ae40/20210423_084122/db-cluster-1ad97d55.zip kubernetes - https://cloudius-jenkins-test.s3.amazonaws.com/1ad97d55-5bd1-4217-860f-878a8860ae40/20210423_084122/kubernetes-1ad97d55.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/1ad97d55-5bd1-4217-860f-878a8860ae40/20210423_084122/loader-set-1ad97d55.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/1ad97d55-5bd1-4217-860f-878a8860ae40/20210423_084122/monitor-set-1ad97d55.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/1ad97d55-5bd1-4217-860f-878a8860ae40/20210423_084122/sct-runner-1ad97d55.zip
@yarongilor a few questions:
The nice thing about the KCL log is that it includes all the DynamoDB API requests and responses (in a hard to use way, but we take what we get...) from zgrep -i "sending request: post" logfile.gz
I see that in the beginning, from time 21:27:11 until 21:27:12, there are a bunch of PutItem, UpdateItem and GetItem operations, I'm guessing tha this is KCL setting up its own metadata. Then In 21:27:12 we start to see GetShardIterator (with TRIM_HORIZON option), and GetRecords - but as you noticed, not a single PutItem or UpdateItem being called after starting all these GetRecords stuff. It turns out that all GetRecords calls return an empty array []
, e.g.,
2021-04-22 21:27:12.996 [ForkJoinPool-1-worker-25] DEBUG org.apache.http.wire - http-outgoing-25 << "{"Records":[],"NextShardIterator":"I7cb297b0-a3b1-11eb-938d-69696ee54350:00000000-0000-0000-0000-000000000000:H178fb6d5066:0fddc45f919a0139eeed7cb02c000031"}"
So there was indeed nothing to be written to the copy table.
Are you sure the original table really has data?
@elcallio did you ever test CDC with just a single token per node? Maybe we have bugs in this case, completely unrelated to the Alternator Streams we were looking for?
~/Downloads/logs/sct-runner-1ad97d55$ grep -i 'decommission\|ycsb' events.log
2021-04-22 21:29:08.022: (YcsbStressEvent Severity.NORMAL): type=start node=Node alternator-streams-nemesis-stream-w-loader-node-1ad97d55-2 [63.33.189.19 | 10.0.1.255] (seed: False)
stress_cmd=bin/ycsb load dynamodb -P workloads/workloadc -threads 10 -p recordcount=78125000 -p fieldcount=1 -p fieldlength=128 -p dataintegrity=true -p insertstart=0 -p insertcount=39062500 -p table=usertable_streams -s -P /tmp/dynamodb.properties -p maxexecutiontime=45600
2021-04-22 21:29:19.735: (YcsbStressEvent Severity.NORMAL): type=start node=Node alternator-streams-nemesis-stream-w-loader-node-1ad97d55-3 [34.246.193.196 | 10.0.0.86] (seed: False)
stress_cmd=bin/ycsb load dynamodb -P workloads/workloadc -threads 10 -p recordcount=78125000 -p fieldcount=1 -p fieldlength=128 -p dataintegrity=true -p insertstart=39062500 -p insertcount=39062500 -p table=usertable_streams -s -P /tmp/dynamodb.properties -p maxexecutiontime=45600
2021-04-22 21:44:26.404: (ClusterHealthValidatorEvent Severity.WARNING): type=NodesNemesis node=None message=There are more then expected nodes running nemesis: 10.0.3.238 (non-seed): DecommissionMonkey; 10.0.2.58 (non-seed): Compare tables size by cf-stats
2021-04-22 21:44:26.454: (DisruptionEvent Severity.NORMAL): type=Decommission subtype=start target_node=Node alternator-streams-nemesis-stream-w-db-node-1ad97d55-2 [3.251.79.138 | 10.0.3.238] (seed: False) duration=None
~/Downloads/logs/sct-runner-1ad97d55$ grep -i 'decommission' events.log | grep type=
2021-04-22 21:44:26.404: (ClusterHealthValidatorEvent Severity.WARNING): type=NodesNemesis node=None message=There are more then expected nodes running nemesis: 10.0.3.238 (non-seed): DecommissionMonkey; 10.0.2.58 (non-seed): Compare tables size by cf-stats
2021-04-22 21:44:26.454: (DisruptionEvent Severity.NORMAL): type=Decommission subtype=start target_node=Node alternator-streams-nemesis-stream-w-db-node-1ad97d55-2 [3.251.79.138 | 10.0.3.238] (seed: False) duration=None
2021-04-22 22:09:59.701: (DisruptionEvent Severity.NORMAL): type=Decommission subtype=end target_node=Node alternator-streams-nemesis-stream-w-db-node-1ad97d55-2 [3.251.79.138 | 10.0.3.238] (seed: False) duration=1532
2021-04-23 08:11:42.177: (DisruptionEvent Severity.NORMAL): type=Decommission subtype=start target_node=Node alternator-streams-nemesis-stream-w-db-node-1ad97d55-7 [34.243.251.19 | 10.0.2.214] (seed: False) duration=None
2021-04-23 08:37:07.327: (DisruptionEvent Severity.NORMAL): type=Decommission subtype=end target_node=Node alternator-streams-nemesis-stream-w-db-node-1ad97d55-7 [34.243.251.19 | 10.0.2.214] (seed: False) duration=1524
yarongilor@yaron-pc:~/Downloads/logs/sct-runner-1ad97d55$ head /tmp/kcl-l0-c0-aba64515-5bfc-495a-ad44-3d4aabf7bd50.log.backup2 Starting a Gradle Daemon, 1 incompatible and 1 stopped Daemons could not be reused, use --status for details
Task :compileJava UP-TO-DATE Task :processResources UP-TO-DATE Task :classes UP-TO-DATE SLF4J: Class path contains multiple SLF4J bindings.
Task :run 21:27:05,294 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml] SLF4J: Found binding in [jar:file:/root/.gradle/caches/modules-2/files-2.1/ch.qos.logback/logback-classic/1.2.3/7c4f3c474fb2c041d8028740440937705ebb473a/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/root/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-log4j12/1.7.5/6edffc576ce104ec769d954618764f39f0f0f10d/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] yarongilor@yaron-pc:~/Downloads/logs/sct-runner-1ad97d55$ tail /tmp/kcl-l0-c0-aba64515-5bfc-495a-ad44-3d4aabf7bd50.log.backup2 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-25] DEBUG c.a.s.k.c.lib.worker.ProcessTask - Kinesis didn't return any records for shard H178fb6d5066:50700000000000007be604722c000051 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-29] DEBUG org.apache.http.headers - http-outgoing-15 >> amz-sdk-invocation-id: 71a489bc-94bc-518a-0434-50f101fbd67c 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-29] DEBUG org.apache.http.headers - http-outgoing-15 >> amz-sdk-retry: 0/0/500 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-12] DEBUG org.apache.http.headers - http-outgoing-24 << Server: Seastar httpd 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-4] DEBUG com.amazonaws.request - Received successful response: 200, AWS Request ID: null 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-12] DEBUG o.a.h.impl.execchain.MainClientExec - Connection can be kept alive for 60000 MILLISECONDS 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-4] DEBUG com.amazonaws.requestId - x-amzn-RequestId: not available 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-4] DEBUG com.amazonaws.requestId - AWS Request ID: not available 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-16] DEBUG org.apache.http.wire - http-outgoing-31 << "Server: Seastar httpd[\r][\n]" 2021-04-22 21:28:20.137 [ForkJoinPool-1-worker-4] DEBUG c.a.s.k.c.lib.worker.ProcessTask - Kinesis didn't return any records for shard H178fb6d5066:0fddc45f919a0139eeed7cb02c000031
3. it makes sense the KCL thread started 2 minutes before YCSB stress:
~/Downloads/logs/sct-runner-1ad97d55$ grep YcsbStressEvent events.log | grep type=start 2021-04-22 21:29:08.022: (YcsbStressEvent Severity.NORMAL): type=start node=Node alternator-streams-nemesis-stream-w-loader-node-1ad97d55-2 [63.33.189.19 | 10.0.1.255] (seed: False) 2021-04-22 21:29:19.735: (YcsbStressEvent Severity.NORMAL): type=start node=Node alternator-streams-nemesis-stream-w-loader-node-1ad97d55-3 [34.246.193.196 | 10.0.0.86] (seed: False)
in any case there are SCT info events that shows the progress of source table increasing size like:
(InfoEvent Severity.NORMAL): message=[31661.12052989006/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 0/2942534 ==
@nyh , if it sounds helpful - i can rerun the test with the default num_tokens and compare the result.
Note that the streamdesc returned in the log @yarongilor posted separately is:
{
"StreamDescription":{
"StreamStatus":"ENABLED",
"StreamArn":"S7cb297b0-a3b1-11eb-938d-69696ee54350",
"StreamViewType":"NEW_IMAGE",
"TableName":"usertable_streams",
"KeySchema":[
{
"AttributeName":"p",
"KeyType":"HASH"
}
],
"Shards":[
{
"ShardId":"H178fb6d5066:0c294a6425030839049d71a3c0000021",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:0c30000000000000f553cf24c8000021",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:0fddc45f919a0139eeed7cb02c000031",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:0fe000000000000069236a5ebc000031",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:415fa92159361d62c1caace5b8000041",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:41600000000000004902bb4ba0000041",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:506e7c26c97b3e912ef1ad70f0000051",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:50700000000000007be604722c000051",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:51be4755ff85087f6e510c7d80000001",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:51c0000000000000fea7723b34000001",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:991eb91afb9aae3a2cbda52a98000011",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
},
{
"ShardId":"H178fb6d5066:992000000000000001575622b4000011",
"SequenceNumberRange":{
"StartingSequenceNumber":"2552737690387355673456638568324694144"
}
}
]
}
}
This means that:
At least the two last points seem wrong/strange(?)
Note: The table can also be younger than the last topology change. Is this the data table being processed (usertable_streams)?
@yarongilor it seems to me that you gave us a KCL log spanning time 21:27:05 through 21:28:20, but the writer only started at 21:29:08, So obviously there will be no data in the table, and KCL will not read any data, so this log is not interesting and not surprising...
I think what we wanted is a KCL log covering the problem that you had, namely that after returning some data, KCL simply stops to return new data. If you want to limit KCL's log to one minute, pick a minute where KCL seems "hung" and not returning any new data - what we hoped to see was what it was doing at that time.
Everything could have been simpler if this test didn't take 10 hours. Did you try to make it much shorter? Also, does the test failure (KCL hang) depend on having a lot of node operations one after another, or does doing a node operation just once cause the problem? Even if the answer to the last answer is no (one topology change doesn't break the client) it would be good to know - it reduces the seriousness of the bug (typical users will not be doing a lot of topology changes in sequence).
* We have a very small number of vnodes in this cluster
Yes, this was a 1-vnode (per node) test. However, @yarongilor, please note that the whole point of the 1-vnode test was to check whether it makes the bug go away, or the bug remains. Well, you forgot to tell us what was the conclusion - is the bug dependent on having many vnodes, or also exists in the 1-vnode case? You don't need any KCL logs to answer that - it's just the question if your KCL workload manages to duplicate the table as your test expects, or not.
* There was no topology change inside the TTL of CDC (24h)
Yes, @yarongilor said that this KCL log is from before any data was inserted to the table, and 10 minutes before (!) any topology change. So this KCL log is not interesting.
* There is no data in the shards above (seemingly, since we get none from reading)
At least the two last points seem wrong/strange(?)
It means the KCL log is from the wrong time - before any data existed in the table and before any topology changes :-(
Just to add some more useless info; I re-ran testing of hydra-kcl (i.e. alternator streams replication process), with a initial two-node cluster, but paused data generation to add a third node. Then adding a fourth node while actually streaming. Other than some race warnings (expected apparently) from the shard lease coordinator in kinesis, it worked flawlessly (if ever so slow). I have yet to see that the integration has any issue with cdc generations. Even when it splits the data into new shard sets.
Updating with this followup test according to the lighter minimal test Core asked for (#8012 reproduction).
In this test it seems KCL continued working properly after a single decommission and add-node operation.
the below 2 screenshots shows a scenario of:
Therefore, the question now is what's the next step. Is it interesting to proceed by testing the default vnodes number?
Installation details
Kernel version: 5.4.0-1035-aws
Scylla version (or git commit hash): 4.5.rc2-0.20210519.5651a20ba
Cluster size: 6 nodes (i3.xlarge)
Scylla running with shards number (live nodes):
alternator-streams-nemesis-stream-w-db-node-05d76290-1 (35.176.133.203 | 10.0.0.219): 4 shards
alternator-streams-nemesis-stream-w-db-node-05d76290-2 (18.132.47.2 | 10.0.3.28): 4 shards
alternator-streams-nemesis-stream-w-db-node-05d76290-3 (35.178.144.251 | 10.0.2.91): 4 shards
alternator-streams-nemesis-stream-w-db-node-05d76290-4 (18.133.238.233 | 10.0.2.240): 4 shards
alternator-streams-nemesis-stream-w-db-node-05d76290-6 (18.130.232.82 | 10.0.1.25): 4 shards
alternator-streams-nemesis-stream-w-db-node-05d76290-7 (3.10.116.23 | 10.0.0.10): 4 shards
Scylla running with shards number (terminated nodes):
alternator-streams-nemesis-stream-w-db-node-05d76290-5 (18.135.6.164 | 10.0.3.217): 4 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0866807c5fe93b793
(aws: eu-west-2)
Test: longevity-alternator-streams-with-nemesis
Test name: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Issue description
====================================
PUT ISSUE DESCRIPTION HERE
====================================
Restore Monitor Stack command: $ hydra investigate show-monitor 05d76290-5647-4b1e-941a-d099543aa9ce
Show all stored logs command: $ hydra investigate show-logs 05d76290-5647-4b1e-941a-d099543aa9ce
Test id: 05d76290-5647-4b1e-941a-d099543aa9ce
Logs: grafana - https://cloudius-jenkins-test.s3.amazonaws.com/05d76290-5647-4b1e-941a-d099543aa9ce/20210523_124344/grafana-screenshot-alternator-20210523_124512-alternator-streams-nemesis-stream-w-monitor-node-05d76290-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/05d76290-5647-4b1e-941a-d099543aa9ce/20210523_124344/grafana-screenshot-longevity-alternator-streams-with-nemesis-scylla-per-server-metrics-nemesis-20210523_124502-alternator-streams-nemesis-stream-w-monitor-node-05d76290-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/05d76290-5647-4b1e-941a-d099543aa9ce/20210523_124344/grafana-screenshot-overview-20210523_124344-alternator-streams-nemesis-stream-w-monitor-node-05d76290-1.png db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/05d76290-5647-4b1e-941a-d099543aa9ce/20210523_124723/db-cluster-05d76290.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/05d76290-5647-4b1e-941a-d099543aa9ce/20210523_124723/loader-set-05d76290.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/05d76290-5647-4b1e-941a-d099543aa9ce/20210523_124723/monitor-set-05d76290.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/05d76290-5647-4b1e-941a-d099543aa9ce/20210523_124723/sct-runner-05d76290.zip
Using a "normal" vnode set should work just fine, given a reasonable interval between topology changes. I think it would be good to run this with a standard node set and a topology change every 15-30 minutes or something. To ensure "slightly" stressed environment works.
The issue is reproduced with default number of vnodes. see in screenshot. live monitor is in: http://18.133.175.74:3000/d/alternator-4-5/alternator?orgId=1&refresh=30s&from=now-1h&to=now
| alternator-streams-nemesis-stream-s-monitor-node-71db9528-1 | eu-west-2a | running | 71db9528-8aec-4712-9628-35e6732947a6 | yarongilor | Tue May 25 08:23:24 2021 |
If I read this correctly, it is stuck doing describe stream, so what is the frequency of topology changes?
AFAIU it, There is only one topology change and that's it.
The live log would suggest something was happening? But not any longer. Did the cluster die? Also, how many shards/node did the cluster have (i.e. how many vnodes total?) I am not great at reading this from prometheus...
Monitor says "down"...
right, it was set to destroy nodes on test end. monitor and logs could now be available in below details. @elcallio , if needed, test can be re-spinned, keeping the nodes.
Installation details
Kernel version: 5.4.0-1035-aws
Scylla version (or git commit hash): 4.5.rc2-0.20210524.b81919dbe with build-id f22c2ac3d227678bb260ee4447b5cdc3f47c6935
Cluster size: 6 nodes (i3.4xlarge)
Scylla running with shards number (live nodes):
alternator-streams-nemesis-stream-s-db-node-71db9528-1 (3.8.117.225 | 10.0.2.236): 14 shards
alternator-streams-nemesis-stream-s-db-node-71db9528-2 (3.8.156.55 | 10.0.1.43): 14 shards
alternator-streams-nemesis-stream-s-db-node-71db9528-3 (35.178.36.120 | 10.0.2.143): 14 shards
alternator-streams-nemesis-stream-s-db-node-71db9528-4 (3.9.117.255 | 10.0.1.167): 14 shards
alternator-streams-nemesis-stream-s-db-node-71db9528-5 (18.133.233.190 | 10.0.3.184): 14 shards
alternator-streams-nemesis-stream-s-db-node-71db9528-7 (3.8.177.149 | 10.0.1.14): 14 shards
Scylla running with shards number (terminated nodes):
alternator-streams-nemesis-stream-s-db-node-71db9528-6 (18.134.151.180 | 10.0.1.49): 14 shards
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0bf97b5ecab8e99e7
(aws: eu-west-2)
Test: longevity-alternator-streams-with-nemesis
Test name: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Issue description
====================================
PUT ISSUE DESCRIPTION HERE
====================================
Restore Monitor Stack command: $ hydra investigate show-monitor 71db9528-8aec-4712-9628-35e6732947a6
Show all stored logs command: $ hydra investigate show-logs 71db9528-8aec-4712-9628-35e6732947a6
Test id: 71db9528-8aec-4712-9628-35e6732947a6
Logs: grafana - https://cloudius-jenkins-test.s3.amazonaws.com/71db9528-8aec-4712-9628-35e6732947a6/20210525_102128/grafana-screenshot-alternator-20210525_102313-alternator-streams-nemesis-stream-s-monitor-node-71db9528-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/71db9528-8aec-4712-9628-35e6732947a6/20210525_102128/grafana-screenshot-longevity-alternator-streams-with-nemesis-scylla-per-server-metrics-nemesis-20210525_102302-alternator-streams-nemesis-stream-s-monitor-node-71db9528-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/71db9528-8aec-4712-9628-35e6732947a6/20210525_102128/grafana-screenshot-overview-20210525_102128-alternator-streams-nemesis-stream-s-monitor-node-71db9528-1.png db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/71db9528-8aec-4712-9628-35e6732947a6/20210525_102559/db-cluster-71db9528.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/71db9528-8aec-4712-9628-35e6732947a6/20210525_102559/loader-set-71db9528.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/71db9528-8aec-4712-9628-35e6732947a6/20210525_102559/monitor-set-71db9528.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/71db9528-8aec-4712-9628-35e6732947a6/20210525_102559/sct-runner-71db9528.zip
It looks like the test was still doing updateitem calls when it got terminated? Why was it terminated? What determined that it failed? Are we sure it did not just take a looong time?
Monitor says "down"...
Nodes are terminated but the monitor is alive and can show all test details. one unclear issue is grafana reports no getRecord or getShardIterator after nemesis, yet it does report continuing writes. the writes can be either YCSB or KCL , but it seems to continue after YCSB stress finished as well. @elcallio , does it make sense only updateItem continues, while no other above Streams query for all this time?
Yeah, I would somewhat want to look at the running cluster once it gets stuck.
One thing: I don't think the KCL in this test is running with periodic shard sync, but instead uses sync-on-end-of-shard. This is not very good for alternator, since we have a gazillion shards, most being empty. This would sort of explain why we seem to be doing an exponential # shard syncs (visavi actual shards existing). The kinesis library is very much written for situations where we have ~1-2 shards for a stream.
I sent a PR ages ago (borked by now I think) that changed KCL to use periodic shard sync.
Test is rerunning with periodic sync mode. still looks like getRecords and getShardIterator APIs are zeroed after decommission nemesis. only updateItem continues. live monitor: http://18.130.33.79:3000/d/alternator-4-5/alternator?orgId=1&refresh=10s&from=now-1h&to=now
| alternator-streams-nemesis-stream-s-monitor-node-da4dae8e-1 | eu-west-2a | running | da4dae8e-cf75-41f2-9d5b-fbe3343a6b44 | yarongilor | Wed May 26 13:05:42 2021 |
Is there a kinesis log available?
Is there a kinesis log available?
ssh -l ubuntu -i ~/.ssh/scylla-qa-ec2 3.8.101.183 tail -f /home/ubuntu/sct-results/latest/alternator-streams-nemesis-stream-s-loader-set-da4dae8e/alternator-streams-nemesis-stream-s-loader-node-da4dae8e-1/kcl-l0-c0-154cb444-da38-47b8-b7d6-0c8311571005.log
I just realized that the kinesis library uses hardcoded period of 1000ms for the sync task. I very much doubt it can finish even the ~1200 calls it requires for syncing in that time. So we might be hammering describestream queries instead. Still strange that it is apparently completely starved.
Unfortunately, this was not run with debug logging, so it is not possible to say which sync belongs to which iteration. But it is quite obvious that it is doing shard sync continuously. Which explains the slowness. The test has not finished/succeeded in copying the data? How much has it copied, if any?
Also: the logs indicate communication failures at one point (connection failed). Did the test kill a node somewhere, not just add?
Also: the logs indicate communication failures at one point (connection failed). Did the test kill a node somewhere, not just add?
The test decommissioned one node, then added a new one.
Unfortunately, this was not run with debug logging, so it is not possible to say which sync belongs to which iteration. But it is quite obvious that it is doing shard sync continuously. Which explains the slowness. The test has not finished/succeeded in copying the data? How much has it copied, if any?
The test successfully copied about a quarter of the data. then it stopped copying at all at the point of this decommission nemesis.
ubuntu@ip-10-0-0-94:~/sct-results/latest$ grep 'c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread' sct.log
< t:2021-05-26 13:34:44,960 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 0/2986 ==
< t:2021-05-26 13:37:14,167 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 208583/288284 ==
< t:2021-05-26 13:39:51,250 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 590279/701374 ==
< t:2021-05-26 13:42:41,562 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 902205/1087922 ==
< t:2021-05-26 13:45:11,154 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1060908/1258886 ==
< t:2021-05-26 13:47:32,582 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1031642/1577025 ==
< t:2021-05-26 13:50:07,980 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1192347/2135199 ==
< t:2021-05-26 13:52:30,679 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1031642/2305248 ==
< t:2021-05-26 13:55:07,709 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1190842/2877722 ==
< t:2021-05-26 13:57:35,909 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1192347/3216292 ==
< t:2021-05-26 14:00:02,220 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1190842/3594297 ==
< t:2021-05-26 14:02:36,355 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1060908/3671122 ==
< t:2021-05-26 14:04:56,317 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1190842/4236765 ==
< t:2021-05-26 14:07:15,766 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1192347/4205490 ==
< t:2021-05-26 14:09:48,338 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1001243/3855921 ==
< t:2021-05-26 14:12:15,490 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1050424/3759807 ==
< t:2021-05-26 14:14:24,621 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1001243/3855921 ==
< t:2021-05-26 14:16:34,578 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1190842/4242057 ==
< t:2021-05-26 14:18:45,956 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1001243/3855921 ==
< t:2021-05-26 14:20:57,986 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1060908/3925000 ==
< t:2021-05-26 14:23:21,491 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1031642/4016580 ==
< t:2021-05-26 14:25:35,211 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1001243/3855921 ==
< t:2021-05-26 14:27:44,683 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1060908/3925000 ==
< t:2021-05-26 14:29:54,076 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1190842/4242057 ==
< t:2021-05-26 14:32:11,017 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1192347/4205490 ==
< t:2021-05-26 14:34:20,627 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1192347/4205490 ==
< t:2021-05-26 14:36:29,758 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1050424/3759807 ==
< t:2021-05-26 14:38:38,653 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1031642/4016580 ==
< t:2021-05-26 14:40:48,344 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1192347/4205490 ==
< t:2021-05-26 14:43:01,487 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1050424/3759807 ==
< t:2021-05-26 14:45:10,522 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1031642/4016580 ==
< t:2021-05-26 14:47:19,579 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1190842/4242057 ==
< t:2021-05-26 14:49:33,470 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1050424/3759807 ==
< t:2021-05-26 14:51:42,332 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1031642/4016580 ==
< t:2021-05-26 14:53:51,949 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1060908/3925000 ==
< t:2021-05-26 14:56:03,734 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1192347/4205490 ==
< t:2021-05-26 14:58:13,235 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1192347/4205490 ==
< t:2021-05-26 15:00:24,624 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1031642/4016580 ==
< t:2021-05-26 15:02:33,849 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1050424/3759807 ==
< t:2021-05-26 15:04:42,659 f:kcl_thread.py l:133 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1031642/4016580 ==
It looks like the node decommissioned was in fact the coordinator/endpoint node for alternator?
I notice that the KCL code does not handle write failures. I.e. when the alternator endpoint died, it simply dropped records being written.
It looks like the node decommissioned was in fact the coordinator/endpoint node for alternator?
the logs seems to show 2 different nodes - the KCL target node is 10.0.1.224:
centos 15330 0.0 0.2 1078608 67824 ? Ssl 13:32 0:00 docker exec -i d0e8362b5100bb818d1d4c9390b046c83228fcd1f1bcebc154b4e9987198d3f5 /bin/bash -c cd /hydra-kcl && ./gradlew run --args=' -t usertable_streams --timeout 32400 -e http://10.0.1.224:8080 '
root 15364 0.3 0.7 12126936 240656 ? Ssl 13:32 0:06 /usr/local/openjdk-8/bin/java -Dorg.gradle.appname=gradlew -classpath /hydra-kcl/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain run --args= -t usertable_streams --timeout 32400 -e http://10.0.1.224:8080
The decommissioned node is 10.0.0.166:
< t:2021-05-26 13:37:46,792 f:cluster.py l:2766 c:sdcm.cluster_aws p:DEBUG > Node alternator-streams-nemesis-stream-s-db-node-da4dae8e-3 [18.130.99.162 | 10.0.0.166] (seed: False): Command '/usr/bin/nodetool decommission ' duration -> 69.36686605100022 s
< t:2021-05-26 13:37:46,825 f:file_logger.py l:80 c:sdcm.sct_events.file_logger p:INFO > 2021-05-26 13:37:46.804: (NodetoolEvent Severity.NORMAL): type=decommission subtype=end node=alternator-streams-nemesis-stream-s-db-node-da4dae8e-3 duration=69.36686605100022s
Yes, but this code is afaik using the load balancing request handler, so we can very well hit the down node. Since there is a delay in updating liveness.
How many thread is this running with (--threads argument to kcl)? It might be an idea to bounce this up from the default of num cpus. Since kinesis does some wacky things, and we might want to increase parallelism at the risk of contention...
And by bounce up, I mean increase to maybe a few hundred or something. Though even here, kinesis is likely to overwhelm concurrency with describestream tasks.
So this is not gonna work. Or at least not for anything beyond 1-3 cores or so. And even then, very slowly.
The heart of the problem is that kinesis is written rather terribly. It creates tasks for every shard (active or not - the ones with parents will "wait" for parent processing). Already here we can start to worry, since we typically have in the order of 10k shards or so in any given cluster.
But of course it gets worse. When executed, these "shardconsumers" will attempt to read data. If they get data, all is well. It reports it and finishes, adding a new task to continue reading. But if it gets no data, it will issue a Thread.sleep
to pause.
But wait, you say. That is not good when we are essentially using threads (in a workpool). We are squandering an execution resource. Possibly halting execution. Yes. Very much so. In fact, this is why KCL is so g**damn slow.
If run against a single-node cluster with one core, and setting thread pool size to something slightly larger than shard count, the test/program executes in ~2s. But once we hit thread pool limits and start sleeping, the execution slows to a crawl.
Now you ask, does this get worse with shard sync/cluster topology changes? Yes, yes it does. Very much so. Because either we run into the dreaded end-of-shard sync, which will issue num shards new syncs, who also waits for each other, but wholly ignores if data retrieved says we could skip reading from server or not, or if we run periodic sync, we run into the fact that the period is hardcoded. And pretty short a period at that. But wait, there is a parameter for this, is there not? Yes. But this causes another sleep to be issued. Blocking stuff even more.
Bottom line is: to have this work in any reliable manner, you need a thread pool that is num shards
large + some extra. But since we have so many shards, this is not feasible. The app, and more over, IO (sockets) die a horrible death.
One option is to fork the library and rewrite it to use futures + scheduledexecutors to do proper job scheduling instead (like the CDC lib).
Unfortunately, because of arbitrary and also hardcoded limitations on identifiers in Dynamo streams (streampos, I am looking at you), the option of making shards a set of stream id:s does not work. We can't represent enough data in the field.
Should really have determined this before. As usual, the culprit is that while this library/program works, it cannot scale in neither cluster or topology history.
So, update on looking at alternate solutions. The good news is that dynamodb sdk v2 has cleaned up their code. Not only is it now harder to read, and allocates more transient objects. But it actually fixed its execution model and is now fully future driven. Even better is that kinesis 2.x is also future based, and indeed can execute fully single threaded should need be. The bad news is that kinesis 2.x does not use/support dynamodb streams. They instead use kinesis streams. https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Operations.html
So while we can write a dynamodb copier using the raw api, all the magic lease (stream pos) management you get with kinesis gets lost and needs to be reinvented from scratch.
The bad news is that kinesis 2.x does not use/support dynamodb streams. They instead use kinesis streams.
Wow, I didn't notice that this happened during the covid period: https://aws.amazon.com/about-aws/whats-new/2020/11/now-you-can-use-amazon-kinesis-data-streams-to-capture-item-level-changes-in-your-amazon-dynamodb-table/ I was sure that DynamoDB Streams was already almost identical to Kinesis, but I guess it wasn't identical enough? I'll open a new a new issue about that.
Hm ... - how fare are the APIs apart
In Kinesis 1.x, DynamoDB streams was not actually supported by kinesis itself, but through the project https://github.com/awslabs/dynamodb-streams-kinesis-adapter. This has not been updated to support 2,x, and according to the issues page, there is no plans to do so.
The kinesis API:s are semantically similar, but very different between versions. What is worse is that as for the places one would hook in a library/adaptor, they seem to almost intentionally have made it difficult so that you can't effectively do it without basically implementing the whole upper transport layer. If I were cynical, I would say this is an intentional strategy to make people move to using kinesis streams in their DynamoDB. Which I assume costs more.
Forking dynamodb-streams-kinesis-adapter and moving it to the 2.x code base is not a trivial task.
@roydahan , @slivne , i think we can close this issue, following @elcallio 's introduction of https://github.com/elcallio/kraken-dynamo. The new lib is already tested and encounters issues.
Pushing this out - lets go back to this on the next CDC call.
Pushed for several times from version to version. Removing it from 5.2 to the large backlog.
@yarongilor please close it if it's not relevant anymore (due to your comment at 2021)
/Cc @roydahan
Installation details Scylla version (or git commit hash):
4.4.dev-0.20210104.d5da455d9 with build-id 56ab4a808a56049c86c5af7a0f4a680dc37f36d1
Cluster size: 6 nodes (i3.4xlarge) OS (RHEL/CentOS/Ubuntu/AWS AMI):ami-018ec229a4231314c
(aws: eu-west-1)Test:
longevity-alternator-streams-with-nemesis
Test name:longevity_test.LongevityTest.test_custom_time
Test config file(s):Issue description
====================================
scenario:
==> result: YCSB reported verification errors as below.
====================================
Restore Monitor Stack command:
$ hydra investigate show-monitor b71161a4-182a-4447-8a35-9e9c63e89b16
Show all stored logs command:$ hydra investigate show-logs b71161a4-182a-4447-8a35-9e9c63e89b16
Test id:
b71161a4-182a-4447-8a35-9e9c63e89b16
Logs: db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/db-cluster-b71161a4.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/loader-set-b71161a4.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/monitor-set-b71161a4.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/sct-runner-b71161a4.zip
Jenkins job URL
YCSB verification error:
the YCSB insert and read commands as found on SCT yaml file are:
some YCSB write temporary connection issues noticed during test, probably due to decommissions, but overall it reported successful run:
the second YCSB thread running the second range part of inserts, reported successful writing as well:
the issue was also reproduced on another run, using ChaosMonkey of multiple nemesis instead of Decommission nemesis: