Closed yarongilor closed 1 year ago
@fruch , please review / confirm assigning this to Dev
Cc @elcallio , @nyh
I need help understanding what problem you actually see.
All that being said, since handling cluster changes - as you have in this case - is the most complicated part of CDC and Alternator Streams makes it even more complicated - I wouldn't be surprised if this is indeed an Alternator or CDC bug. @yarongilor, @fruch, can you make an attempt to prepare a minimal reproducer? Maybe it doesn't need running 10 hours and hundreds of node decomissions - maybe it can be reproduced with a much shorter run and just one decomission? (I don't know if it can, but maybe...).
I need help understanding what problem you actually see.
- You said there is "YCSB verification error" and quoted some lines, but I don't understand at all what this is saying, and what kind of error is this... Are there missing rows? Wrong rows? Or what? It would be helpful to know what kind of error we see, not just that it "failed".
- Is the write workload using CL=QUORUM and the read using CL=QUORUM as well?
- Are we sure that the workload had zero write failures? A write error which is retried is fine. If there is a write error which is not retried, it can lead to only one replica succeeding, and cause random results from read (of both base table and CDC log table, separately)?
All that being said, since handling cluster changes - as you have in this case - is the most complicated part of CDC and Alternator Streams makes it even more complicated - I wouldn't be surprised if this is indeed an Alternator or CDC bug. @yarongilor, @fruch, can you make an attempt to prepare a minimal reproducer? Maybe it doesn't need running 10 hours and hundreds of node decomissions - maybe it can be reproduced with a much shorter run and just one decomission? (I don't know if it can, but maybe...).
we'll try to see if we can come up with smaller sized reproducer, but it might take us a bit more then 10 hours it's gonna run.
1) we are not running YCSB with full logs, we could try, but that would generate a ton of them. 2) I'm not sure why you are asking about CL ? it's alternator right ? we have control only on reads..., and in those case the cluster was health when reading. (i.e. it wasn't inside of a nemesis) 3) we'll need to check write failure, we didn't took a closer look.
on-top of that we have a few issues on the test end, for example not collecting the hydra-kcl log (i.e. the process which reading from the stream),
We might need a bit of help define which important part we might want to add logs for debugging (at the KCL end).
one more option is try a reproducer in dtest, without KCL (to begin with), (i.e. reading from stream while doing decommission or other topology changes tasks) It's a task on our backlog. but it's not started yet. we currently only have a test that read the stream before and after the decomission.
The run mentioned here is also with old version of monitoring, we might have better view with a monitoring that have dashboard for the alternator streams APIs, and we'll be running next with the newer version
@yarongilor I am asking all these questions because if I were tasked to debug this, I would have no idea what your workload is doing, or what exactly is the "proof" that there is even a problem. How does the test recognize an error? It sees the two tables are different? Do we know what is different - does one table (which) have more rows? Errors in the data? Or what?
You're right that Alternator writes are always CL=QUORUM. Even if all writes succeeded (did they? it's important to know that...) you still need to do CL=QUORUM reads (i.e., Alternator consistent reads) otherwise you can miss some data. You can't say that the cluster is "healthy" if you didn't do a repair - did you?
@yarongilor I am asking all these questions because if I were tasked to debug this, I would have no idea what your workload is doing, or what exactly is the "proof" that there is even a problem. How does the test recognize an error? It sees the two tables are different? Do we know what is different - does one table (which) have more rows? Errors in the data? Or what?
You're right that Alternator writes are always CL=QUORUM. Even if all writes succeeded (did they? it's important to know that...) you still need to do CL=QUORUM reads (i.e., Alternator consistent reads) otherwise you can miss some data. You can't say that the cluster is "healthy" if you didn't do a repair - did you?
I'll check that we are using consist reads in the verification stage. we might have missed that.
Some more thoughts about this test:
@yarongilor I am asking all these questions because if I were tasked to debug this, I would have no idea what your workload is doing, or what exactly is the "proof" that there is even a problem. How does the test recognize an error? It sees the two tables are different? Do we know what is different - does one table (which) have more rows? Errors in the data? Or what? You're right that Alternator writes are always CL=QUORUM. Even if all writes succeeded (did they? it's important to know that...) you still need to do CL=QUORUM reads (i.e., Alternator consistent reads) otherwise you can miss some data. You can't say that the cluster is "healthy" if you didn't do a repair - did you?
I'll check that we are using consist reads in the verification stage. we might have missed that.
we by default using consistent reads (on all ycsb runs): https://github.com/scylladb/scylla-cluster-tests/blob/master/sdcm/ycsb_thread.py#L134
On Tue, Feb 2, 2021 at 3:41 PM nyh notifications@github.com wrote:
Some more thoughts about this test:
- An idea raised by @slivne https://github.com/slivne is that we should run the verification ycsb also on the base table - maybe some data is missing already there, and not just in the copy?
- The YCSB verification code - that uses the DynamoDB API? If so, is it using consistent read (which translates to CL=local_quorum)? If not, it should.
- I would like go be confident that none of the base-table writes failed, and also that the KCL code did not encounter any write erros which could mean that some data was read from the stream but just failed to be written to the second table.
- Maybe @asias https://github.com/asias or @slivne https://github.com/slivne can help me understand here, but is "nodetool decommission" and then adding a new node even safe in the sense of consistency? If we take three nodes, and decomission one, isn't it possible that we are left with two replicas with different values for a row (old and new), and then when we add a new node, it copies data from just one of them, and may copy the old version, and now we have two copies of the old value and one of the new, and a quorum read can read old data?
In this case its not safe -you are loosing one replica.
Yet in any other case in which 3 replicas remain the process is supposed to be safe.
Starting from 3 nodes, adding one, decommissioning another is safe Starting from 4 nodes, decommissioning one, adding another is safe
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla/issues/8012#issuecomment-771641853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2OCCH3ATULIWQDSYO5FOLS476HFANCNFSM4W5H5G7A .
In this case its not safe -you are loosing one replica. Yet in any other case in which 3 replicas remain the process is supposed to be safe. Starting from 3 nodes, adding one, decommissioning another is safe Starting from 4 nodes, decommissioning one, adding another is safe
@fruch can you tell me a bit about what the "decomission nemisis" is doing? Is it adding a node and then commissioning a node or the opposite order.
@slivne is the safe decomission procedure documented somewhere? I'm not not sure I understand why even your order is safe: Consider we have 3 replicas, two new values, one old. Now we add a node. That, if I understand correctly, streams some random copy to the new node - so we can have a new old value. At this point we have 4 replicas, two new values, two old values. Now we do a decomission. Let's say we decomission a node with the new value. If I understand correctly (correct me if I'm wrong) it sends this new value to just one (not all as in repair) of the other nodes. If it happens to send it to the node that already had the new value, we end up with 3 replicas, one new and two old - bad.
thanks @nyh . what currently slows down the clarification of the issue is that it is not reproduced consistently. once we'll come up with a more consistent reproduction, we can try narrowing down the scenario.
perhaps a somehow useful new insight is that the issue wasn't reproduced with the following nemesis, where no decommission ran:
restart_then_repair_node
major_compaction
add_drop_column
multiple_hard_reboot_node
nodetool_drain
mgmt_backup_specific_keyspaces
snapshot_operations
multiple_hard_reboot_node
mgmt_repair_cli
kill_scylla
nodetool_cleanup
run_cdcstressor_tool
repair_streaming_err
the issue did reproduce in another test that did run some kind of remove and add node, with this nemesis sequence:
mgmt_repair_cli
run_unique_sequence
remove_node_then_add_node
mgmt_repair_cli
multiple_hard_reboot_node
nodetool_enospc
restart_then_repair_node
nodetool_drain
decommission_streaming_err
network_reject_thrift
truncate_large_partition
In this case its not safe -you are loosing one replica. Yet in any other case in which 3 replicas remain the process is supposed to be safe. Starting from 3 nodes, adding one, decommissioning another is safe Starting from 4 nodes, decommissioning one, adding another is safe
@fruch can you tell me a bit about what the "decomission nemisis" is doing? Is it adding a node and then commissioning a node or the opposite order.
in our test there are 6 nodes, decommission nemesis is:
1) decomission one node using nodetoll decomission
2) deleteing this node from AWS
3) provision and add a new node (wait for steam to end, and not to be available)
4) run nodetool cleanup
on all of the nodes.
(basically following this https://docs.scylladb.com/operating-scylla/procedures/cluster-management/remove_node/)
@slivne is the safe decomission procedure documented somewhere? I'm not not sure I understand why even your order is safe: Consider we have 3 replicas, two new values, one old. Now we add a node. That, if I understand correctly, streams some random copy to the new node - so we can have a new old value. At this point we have 4 replicas, two new values, two old values. Now we do a decomission. Let's say we decomission a node with the new value. If I understand correctly (correct me if I'm wrong) it sends this new value to just one (not all as in repair) of the other nodes. If it happens to send it to the node that already had the new value, we end up with 3 replicas, one new and two old - bad.
in our test there are 6 nodes, decommission nemesis is:
1. decomission one node using `nodetoll decomission` 2. deleteing this node from AWS 3. provision and add a new node (wait for steam to end, and not to be available) 4. run `nodetool cleanup` on all of the nodes.
(basically following this https://docs.scylladb.com/operating-scylla/procedures/cluster-management/remove_node/)
Thanks @fruch. It seems it is using the reverse order from the instructions? Please see above why I think it can lead to only one out of three replicas having the correct content, and wrong quorum read results. Are you really using the reverse order, or just explained it here in the wrong order?
I'm not sure (not even close to sure...) that this is the problem we are seeing here, though. Might be a completely unrelated red herring.
@nyh , @fruch , there may be a potential greater hint for the source of the problem. it is by looking at the cfstats periodical report of source and destination Stream tables. the number of partitions reported for destination table ceases to increase after the first decommission. while on another test without decommission it kept on increasing with a value similar to source table up to the end of writes.
the number of partitions on the dest table gets stuck around 800K - 1M after decommission until the write load ends, while the source table gets to 40M: (ignore the "has not been published or dumped, maybe you missed .publish()" testing issue)
< t:2021-02-02 13:54:54,214 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 0/0 ==
< t:2021-02-02 13:54:54,214 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[10.113263845443726/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 0/0 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 13:55:11,490 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 0/13707 ==
< t:2021-02-02 13:55:11,490 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[27.389247179031372/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 0/13707 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 13:55:35,167 f:nemesis.py l:566 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.DecommissionMonkey: Set current_disruption -> DecommissionMonkey on target node Node alternator-streams-nemesis-stream-w-db-node-3a649026-6 [54.75.30.201 | 10.0.2.194] (seed: False)
< t:2021-02-02 13:56:13,900 f:nemesis.py l:566 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.DecommissionMonkey: Set current_disruption -> Decommission Node alternator-streams-nemesis-stream-w-db-node-3a649026-6 [54.75.30.201 | 10.0.2.194] (seed: False)
< t:2021-02-02 13:57:38,338 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 144306/307163 ==
< t:2021-02-02 13:57:38,338 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[174.23685503005981/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 144306/307163 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:00:05,646 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 387845/563285 ==
< t:2021-02-02 14:00:05,646 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[321.54533791542053/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 387845/563285 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:02:40,940 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 614767/791003 ==
< t:2021-02-02 14:02:40,940 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[476.83937549591064/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 614767/791003 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:05:08,205 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 834305/1013601 ==
< t:2021-02-02 14:05:08,205 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[624.1042518615723/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 834305/1013601 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:07:26,488 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 828948/1109028 ==
< t:2021-02-02 14:07:26,488 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[762.3874325752258/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 828948/1109028 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:09:47,539 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 851657/1404906 ==
< t:2021-02-02 14:09:47,539 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[903.4381284713745/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 851657/1404906 == has not been published or dumped, maybe you missed .pub
lish()
< t:2021-02-02 14:12:33,990 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1019046/1920401 ==
< t:2021-02-02 14:12:33,990 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1069.8891949653625/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1019046/1920401 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:14:51,424 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1018787/2176932 ==
< t:2021-02-02 14:14:51,424 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1207.3235836029053/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1018787/2176932 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:17:13,430 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 828948/2162728 ==
< t:2021-02-02 14:17:13,430 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1349.3288397789001/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 828948/2162728 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:19:47,497 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/2739892 ==
< t:2021-02-02 14:19:47,497 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1503.3963222503662/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/2739892 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:22:14,230 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1018787/3001382 ==
< t:2021-02-02 14:22:14,230 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1650.1296529769897/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1018787/3001382 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:24:42,605 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 851657/3039601 ==
< t:2021-02-02 14:24:42,605 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1798.504313468933/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 851657/3039601 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:27:04,404 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 828948/3210123 ==
< t:2021-02-02 14:27:04,404 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[1940.303292274475/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 828948/3210123 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:29:29,591 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/3839237 ==
< t:2021-02-02 14:29:29,592 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2085.490391731262/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/3839237 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:31:53,517 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 851657/3823202 ==
< t:2021-02-02 14:31:53,517 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2229.416089773178/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 851657/3823202 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:34:16,669 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1074511/4565785 ==
< t:2021-02-02 14:34:16,669 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2372.5681977272034/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1074511/4565785 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:36:39,111 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/4653125 ==
< t:2021-02-02 14:36:39,112 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2515.0106549263/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/4653125 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:37:35,934 f:nemesis.py l:566 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.DecommissionMonkey: Set current_disruption -> DecommissionMonkey on target node Node alternator-streams-nemesis-stream-w-db-node-3a649026-2 [54.229.107.68 | 10.0.0.138] (seed: False)
< t:2021-02-02 14:38:09,036 f:nemesis.py l:566 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.DecommissionMonkey: Set current_disruption -> Decommission Node alternator-streams-nemesis-stream-w-db-node-3a649026-2 [54.229.107.68 | 10.0.0.138] (seed: False)
< t:2021-02-02 14:38:57,808 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/5824079 ==
< t:2021-02-02 14:38:57,808 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2653.70747923851/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1016831/5824079 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:41:16,517 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1144310/6333107 ==
< t:2021-02-02 14:41:16,518 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2792.416520357132/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1144310/6333107 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:43:36,492 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1144310/6652175 ==
< t:2021-02-02 14:43:36,492 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[2932.3915326595306/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1144310/6652175 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:46:08,000 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 984475/6158596 ==
< t:2021-02-02 14:46:08,000 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3083.8988795280457/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 984475/6158596 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:48:58,215 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1148175/7051212 ==
< t:2021-02-02 14:48:58,215 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3254.1145927906036/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1148175/7051212 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:51:24,049 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 1048733/6952274 ==
< t:2021-02-02 14:51:24,049 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3399.9478480815887/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1048733/6952274 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:53:53,908 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 878938/6552516 ==
< t:2021-02-02 14:53:53,908 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3549.806868314743/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 878938/6552516 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:56:21,508 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 870453/6671281 ==
< t:2021-02-02 14:56:21,508 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3697.406831264496/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 870453/6671281 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 14:58:46,342 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 853205/6697584 ==
< t:2021-02-02 14:58:46,342 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3842.2413997650146/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 853205/6697584 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:01:11,880 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/7576755 ==
< t:2021-02-02 15:01:11,880 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[3987.7790808677673/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/7576755 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:03:35,490 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 870453/7487232 ==
< t:2021-02-02 15:03:35,490 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4131.389223098755/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 870453/7487232 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:06:20,902 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 895384/7874602 ==
< t:2021-02-02 15:06:20,902 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4296.800801992416/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 895384/7874602 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:08:42,948 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/8378593 ==
< t:2021-02-02 15:08:42,949 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4438.84773850441/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/8378593 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:11:05,081 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 870453/8304205 ==
< t:2021-02-02 15:11:05,081 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4580.979810476303/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 870453/8304205 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:13:24,189 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/8900536 ==
< t:2021-02-02 15:13:24,189 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4720.088034391403/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/8900536 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:15:43,068 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/9152973 ==
< t:2021-02-02 15:15:43,069 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[4858.967678070068/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/9152973 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:18:15,759 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 878938/9341924 ==
< t:2021-02-02 15:18:15,759 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[5011.658487081528/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 878938/9341924 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:20:35,220 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 848879/9692553 ==
< t:2021-02-02 15:20:35,220 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[5151.119571447372/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 848879/9692553 == has not been published or dumped, maybe you missed .publish()
< t:2021-02-02 15:23:05,342 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 853205/9387780 ==
< t:2021-02-02 15:23:05,342 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[5301.241162776947/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 853205/9387780 == has not been published or dumped, maybe you missed .publish()
...
< t:2021-02-02 22:56:12,256 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[32488.155284643173/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 1008115/45403397 == has not been published or dumped, maybe you missed .publish()
on the different successful test, it reported almost the same number at the end of writing stress:
< t:2021-01-29 00:28:39,859 f:base.py l:206 c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (InfoEvent Severity.NORMAL): message=[32459.151389598846/32400] == CompareTablesSizesThread: dst table/src table number of partitions: 40845578/40845538 == has not been published or dumped, maybe you missed .publish()
i guess we better look on the KCL log for Stream failures after the point of decommission.
updated problem scenario is:
[ForkJoinPool-1-worker-30] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:9b7bd2e14eb4b787aab386d4d4000a81
kcl-l0-c0-307f6c62-24a2-4c86-a98b-d3c4cdaeefc6.log
KCL log has:
Creating table usertable_streams
Creating table usertable_streams-dest
2021-02-04 15:03:18.626 [main] INFO StreamsAdapterDemo - Table is active.
2021-02-04 15:03:18.628 [main] INFO StreamsAdapterDemo - Table is active.
2021-02-04 15:03:18.638 [main] INFO StreamsAdapterDemo - Creating worker for stream: S1aee8af0-66fa-11eb-b03f-0c9d930cf12c
2021-02-04 15:03:18.646 [main] WARN c.a.s.k.c.lib.worker.Worker - No configuration received for endpoint and region, will default region to us-east-1
2021-02-04 15:03:18.677 [main] INFO StreamsAdapterDemo - Starting worker...
2021-02-04 15:03:18.677 [main] INFO StreamsAdapterDemo - Sleeping for 32400sec
2021-02-04 15:16:34.841 [ForkJoinPool-1-worker-18] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:00671f8de2f6d6a7f8c22a5d7c003191
2021-02-04 15:17:07.376 [ForkJoinPool-1-worker-11] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:c8f0ed125ef6504451c19d4af8001e21
2021-02-04 15:17:35.499 [ForkJoinPool-1-worker-15] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:1c48d7777352dae5f3b04f5ccc003cc1
2021-02-04 15:18:13.027 [ForkJoinPool-1-worker-23] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:9d73d46ec93094f3aab721e3c8000b61
2021-02-04 15:18:42.285 [ForkJoinPool-1-worker-22] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:b7c84c581ef7d5f4bc30429b9c001691
2021-02-04 15:19:12.073 [ForkJoinPool-1-worker-17] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:6f5a53eee91357bd59af546cdc005ab1
2021-02-04 15:19:40.071 [ForkJoinPool-1-worker-30] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H1776d8470d7:9b7bd2e14eb4b787aab386d4d4000a81
see attached grafana screenshots for these alternator metrics and see also the KCL log file.
one addition, that while seeing that describe stream is being called again and again, we tried it manually to see the output:
aws dynamodbstreams describe-stream --stream S1aee8af0-66fa-11eb-b03f-0c9d930cf12c --endpoint-url http://52.17.134.59:8080
{
"StreamDescription": {
"StreamArn": "S1aee8af0-66fa-11eb-b03f-0c9d930cf12c",
"StreamStatus": "ENABLED",
"StreamViewType": "NEW_IMAGE",
"TableName": "usertable_streams",
"KeySchema": [
{
"AttributeName": "p",
"KeyType": "HASH"
}
],
"Shards": [
{
"ShardId": "H1776d8470d7:003e003ecc0e5ca8b3f18756c0003181",
"SequenceNumberRange": {
"StartingSequenceNumber": "2551506219633207326769402383851094144",
"EndingSequenceNumber": "2551506498337500371376796833832534144"
}
},
... more similar looking shards with the same start and end ....
],
"LastEvaluatedShardId": "H1776d8470d7:01049249249249257aa9593f240031f1"
}
}
Am guessing (since getShardIterator wasn't called again), that KCL isn't o.k. with those sequence numbers, or they aren't updated, and it can compare it with sequences it already been reading. (Didn't yet looked up the origin of "Dropped lease for shutting down" print, to try to find logic that lead to it)
@fruch - I would like to see the full dump of the above stream desc. And also continued after last-eval-id
@fruch - I would like to see the full dump of the above stream desc. And also continued after last-eval-id
@elcallio I'm spinning it again, so you can see it live: https://jenkins.scylladb.com/job/scylla-staging/job/Longevity_yaron/job/longevity-alternator-streams-with-nemesis/18/
I'll send details once it's reaching that point, I don't have dumps of the API from those runs.
I thought maybe this issue is related to the fact we weren't using checkpoints, and I've tried it with this change: https://github.com/fruch/hydra-kcl/pull/4
It didn't helped, and we still lost streaming on the first time topology was changed.
@nyh @elcallio, while @yarongilor is working on a reproducer for this on dtest. we need more ideas on how to get to the bottom of this...
I'd say add verbose logging to the shard iteration inside KCL, i.e. print what you query, result, and where you go from there. It seems you get stuck on either an old generation (why so many? nemesis node kills?) or ids that do not validate.
I'd say add verbose logging to the shard iteration inside KCL, i.e. print what you query, result, and where you go from there. It seems you get stuck on either an old generation (why so many? nemesis node kills?) or ids that do not validate.
@elcallio care to give me an hand in which part you think the logs would be helpful ? before I give it a go ? https://github.com/fruch/hydra-kcl/blob/master/src/main/resources/logback.xml#L25
Sorry, you will probably need to read the library code to know that. Not in my head.
@elcallio , @nyh , @fruch , please advise if the above draft-reproducer PR is in some way helps getting more reproduction info. ( https://github.com/scylladb/scylla-dtest/pull/1977 )
@elcallio After looking again into the docs of dynamodb streams:
EndingSequenceNumber
The ending sequence number for the range. Shards that are in the OPEN state have an ending sequence number of null.
Could be that once we do topology change all EndingSequenceNumber
are not null
? i.e. KCL might consider all of the shards closed ?
The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should. It is worth noting that sequence numbers are considered shard and generation local, i.e. they can appear in more than one shard and generation (they are after all just timestamps in our case). I think this is the case even in original dynamo streams.
The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should. It is worth noting that sequence numbers are considered shard and generation local, i.e. they can appear in more than one shard and generation (they are after all just timestamps in our case). I think this is the case even in original dynamo streams.
That exactly why we are trying to build a reproducer in dtest (or tested in SCT with the KCL to begin with).
@elcallio would it help to doing a cluster again ? to verify this directly ? If so please be in touch with @yarongilor spin it back up as needed
The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should.
I think this can be easily done as a dtest - you create a cluster, list the streams and remember them. Now you add a node to the cluster, and list the streams again. This list should contain all the old streams, plus some new ones. All the new ones should have no EndingSequenceNumber (and I think all the old ones should have a EndingSequenceNumber now).
It can be done in unit test even simpler. The last (possibly only) generation should always have null end sequences.
@elcallio my point is that an external test - not written in C++ with some secret access to internal information - doesn't know what is "the last gereneration" (right?) - so the sequence of events I described was so that the test can know what is this "last generation" - the new streams that suddenly appear after a node is added. If you have this extra internal information in the C++ unit tests then yes, it can be simpler (but rely on implementation details).
The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should.
I think this can be easily done as a dtest - you create a cluster, list the streams and remember them. Now you add a node to the cluster, and list the streams again. This list should contain all the old streams, plus some new ones. All the new ones should have no EndingSequenceNumber (and I think all the old ones should have a EndingSequenceNumber now).
@nyh I was talking with @yarongilor yesterday exactly on doing that in dtest (and our talk was the reason we reread the docs again).
No it doesn't. It just verifies we get one set of shards with end==null. No special knowledge - just pure dynamostreams API.
The last CDC generation should always generate a set of shards with end = null (not set). We don't really have a unit test for this, but should.
I think this can be easily done as a dtest - you create a cluster, list the streams and remember them. Now you add a node to the cluster, and list the streams again. This list should contain all the old streams, plus some new ones. All the new ones should have no EndingSequenceNumber (and I think all the old ones should have a EndingSequenceNumber now).
@nyh I was talking with @yarongilor yesterday exactly on doing that in dtest (and our talk was the reason we reread the docs again).
running a small test of it, looks like at any given moment there are 1024 opened shards, where the number of shards is also increasing in 1024 additional shards on topology change. sample test log output is:
2021-02-25 12:25:58.623781 Pre setup - Creating "user_table" table via "node1" node with "{'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}}" stream key
2021-02-25 12:25:58.727871 Creating a new table 'user_table' using node 'node1'..
2021-02-25 12:26:00.950263 The table 'user_table' successfully created..
2021-02-25 12:26:00.953993 Table's schema and configuration are: {'Table': {'AttributeDefinitions': [{'AttributeName': 'hash_key', 'AttributeType': 'S'}], 'TableName': 'user_table', 'KeySchema': [{'AttributeName': 'hash_key', 'KeyType': 'HASH'}], 'TableStatus': 'ACTIVE', 'CreationDateTime': datetime.datetime(2021, 2, 25, 12, 26, tzinfo=tzlocal()), 'TableArn': 'arn:scylla:alternator:alternator_user_table:scylla:table/user_table', 'TableId': 'd9f030c0-7753-11eb-a2e9-49dcf2e9eec8', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST', 'LastUpdateToPayPerRequestDateTime': datetime.datetime(2021, 2, 25, 12, 26, tzinfo=tzlocal())}, 'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}, 'LatestStreamLabel': '"2021-02-25T12:25:58Z"', 'LatestStreamArn': 'Sda02ce60-7753-11eb-a2e9-49dcf2e9eec8'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '634', 'content-type': 'application/json', 'date': 'Thu, 25 Feb 2021 10:26:00 GMT', 'server': 'Seastar httpd'}, 'RetryAttempts': 0}}
2021-02-25 12:26:00.954651 Adding new 400 items to table 'user_table'..
2021-02-25 12:26:01.271578 trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-25 12:26:01.291290 Waiting until stream of "user_table" table be active
2021-02-25 12:26:01.291470 trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-25 12:26:01.354776 Start Alternator stress of put_table_items..
Using parameters of: {'table_name': 'user_table', 'node': <ccmlib.scylla_node.ScyllaNode object at 0x7fc6fa281d60>, 'num_of_items': 100, 'verbose': False}
2021-02-25 12:26:01.356105 Running put_table_items...
2021-02-25 12:26:01.366810 Start decommission thread of _run_decommission_add_node_loop..
2021-02-25 12:26:01.375397 Running _run_decommission_add_node_loop...
2021-02-25 12:26:01.375629 Decommissioning node4..
2021-02-25 12:26:01.968918 Shards updated status is:
2021-02-25 12:26:01.969040 Existing number of shards [0] is updated to: [1024]
2021-02-25 12:26:01.970111 Existing number of open shards [0] is updated to: [1024]
2021-02-25 12:26:03.090101 Shards updated status is:
2021-02-25 12:26:03.090330 Existing number of shards [1024] is updated to: [1024]
2021-02-25 12:26:03.094255 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:26:18.655721 Shards updated status is:
2021-02-25 12:26:18.655846 Existing number of shards [1024] is updated to: [1024]
2021-02-25 12:26:18.657774 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:26:24.153712 Adding new node to cluster..
2021-02-25 12:26:34.349791 Shards updated status is:
2021-02-25 12:26:34.349935 Existing number of shards [1024] is updated to: [1024]
2021-02-25 12:26:34.352423 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:26:49.906273 Shards updated status is:
2021-02-25 12:26:49.906378 Existing number of shards [1024] is updated to: [1024]
2021-02-25 12:26:49.908600 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:27:05.699200 Shards updated status is:
2021-02-25 12:27:05.699389 Existing number of shards [1024] is updated to: [2048]
2021-02-25 12:27:05.703279 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:27:21.920970 Shards updated status is:
2021-02-25 12:27:21.921084 Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:27:21.927367 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:27:28.666618 Node successfully added!
2021-02-25 12:27:37.658836 Shards updated status is:
2021-02-25 12:27:37.658948 Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:27:37.662927 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:27:43.681446 Decommissioning node5..
2021-02-25 12:27:53.811384 Shards updated status is:
2021-02-25 12:27:53.811537 Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:27:53.826324 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:28:09.144546 Adding new node to cluster..
2021-02-25 12:28:10.038065 Shards updated status is:
2021-02-25 12:28:10.038279 Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:28:10.043298 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:28:26.074693 Shards updated status is:
2021-02-25 12:28:26.074900 Existing number of shards [2048] is updated to: [2048]
2021-02-25 12:28:26.080218 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:28:43.057119 Shards updated status is:
2021-02-25 12:28:43.058037 Existing number of shards [2048] is updated to: [3072]
2021-02-25 12:28:43.064048 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:28:59.796794 Shards updated status is:
2021-02-25 12:28:59.799781 Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:28:59.845306 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:29:16.598609 Node successfully added!
2021-02-25 12:29:16.630708 Shards updated status is:
2021-02-25 12:29:16.630850 Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:29:16.640763 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:29:31.618500 Decommissioning node6..
2021-02-25 12:29:33.914756 Shards updated status is:
2021-02-25 12:29:33.914918 Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:29:33.924050 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:29:51.056645 Shards updated status is:
2021-02-25 12:29:51.056794 Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:29:51.063571 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:30:00.724011 Adding new node to cluster..
2021-02-25 12:30:07.547733 Shards updated status is:
2021-02-25 12:30:07.547888 Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:30:07.556992 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:30:23.881786 Shards updated status is:
2021-02-25 12:30:23.882078 Existing number of shards [3072] is updated to: [3072]
2021-02-25 12:30:23.889359 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:30:41.056615 Shards updated status is:
2021-02-25 12:30:41.056789 Existing number of shards [3072] is updated to: [4096]
2021-02-25 12:30:41.070664 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:30:58.897867 Shards updated status is:
2021-02-25 12:30:58.898102 Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:30:58.922520 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:31:09.595575 Node successfully added!
2021-02-25 12:31:15.676255 Shards updated status is:
2021-02-25 12:31:15.676392 Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:31:15.687763 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:31:24.610455 Decommissioning node7..
2021-02-25 12:31:33.051741 Shards updated status is:
2021-02-25 12:31:33.051846 Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:31:33.060408 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:31:53.692776 Shards updated status is:
2021-02-25 12:31:53.692905 Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:31:53.705916 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:32:01.604710 Adding new node to cluster..
2021-02-25 12:32:10.162478 Shards updated status is:
2021-02-25 12:32:10.162786 Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:32:10.171605 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:32:27.811088 Shards updated status is:
2021-02-25 12:32:27.811302 Existing number of shards [4096] is updated to: [4096]
2021-02-25 12:32:27.819995 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:32:45.537272 Shards updated status is:
2021-02-25 12:32:45.537446 Existing number of shards [4096] is updated to: [5120]
2021-02-25 12:32:45.550380 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:33:06.497300 Shards updated status is:
2021-02-25 12:33:06.502508 Existing number of shards [5120] is updated to: [5120]
2021-02-25 12:33:06.521749 Existing number of open shards [1024] is updated to: [1024]
2021-02-25 12:33:12.097748 Node successfully added!
A possible reproduction is encountered at a revised updated test. @elcallio , @fruch , @nyh , please advise. The reproduction test scenario is:
create a table for Streams. run alternator write stress thread in background. run another background thread that decommission a node, then adds a new node. run a main loop that call Streams APIs , checking for existing open-shards and open-shards-records diff (to verify a change from previous cycle). the test search for records of open-shards only. it gets its shard-iterators using TRIM_HORIZON type, then query for their records. the first few cycles resulted with records content, then, after several cycles, all the shard iterators did not return a response with records, thus an assertion for existing records failed the test.
test can be found in: https://github.com/yarongilor/scylla-dtest/blob/b2e9d608d70910ed31bfc7da8c6d2f1441e332c7/alternator_stream_tests.py#L97
test output:
2021-02-28 22:07:45.470138 - Starting Scylla cluster version master_2021-02-15T11_11_32Z
2021-02-28 22:07:45.473380 - Allocated cluster ID 44: /home/yarongilor/.dtest/dtest-zvtxwjw5
2021-02-28 22:07:45.532513 - Scylla mode is 'release'
2021-02-28 22:07:45.532611 - Cluster *_request_timeout_in_ms=10000, range_request_timeout_in_ms=30000, cql request_timeout=30
2021-02-28 22:07:45.540136 - Starting node 1
2021-02-28 22:08:12.301486 - Starting node 2
2021-02-28 22:09:14.807171 - Starting node 3
2021-02-28 22:10:17.755874 - Starting node 4
2021-02-28 22:11:21.236767 - Pre setup - Creating "user_table" table via "node1" node with "{'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}}" stream key
2021-02-28 22:11:28.528368 - Creating a new table 'user_table' using node 'node1'..
2021-02-28 22:11:30.141509 - The table 'user_table' successfully created..
2021-02-28 22:11:30.144094 - Table's schema and configuration are: {'Table': {'AttributeDefinitions': [{'AttributeName': 'hash_key', 'AttributeType': 'S'}], 'TableName': 'user_table', 'KeySchema': [{'AttributeName': 'hash_key', 'KeyType': 'HASH'}], 'TableStatus': 'ACTIVE', 'CreationDateTime': datetime.datetime(2021, 2, 28, 22, 11, 30, tzinfo=tzlocal()), 'TableArn': 'arn:scylla:alternator:alternator_user_table:scylla:table/user_table', 'TableId': '242e4460-7a01-11eb-8ad0-a0dea5f52420', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST', 'LastUpdateToPayPerRequestDateTime': datetime.datetime(2021, 2, 28, 22, 11, 30, tzinfo=tzlocal())}, 'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}, 'LatestStreamLabel': '"2021-02-28T22:11:28Z"', 'LatestStreamArn': 'S2430dc70-7a01-11eb-8ad0-a0dea5f52420'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '634', 'content-type': 'application/json', 'date': 'Sun, 28 Feb 2021 20:11:29 GMT', 'server': 'Seastar httpd'}, 'RetryAttempts': 0}}
2021-02-28 22:11:30.144298 - Adding new 100 items to table 'user_table'..
2021-02-28 22:11:30.246841 - trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-28 22:11:30.260607 - Start Alternator stress of put_table_items..
Using parameters of: {'table_name': 'user_table', 'node': <ccmlib.scylla_node.ScyllaNode object at 0x7f8ce8563b50>, 'num_of_items': 1000, 'verbose': False}
2021-02-28 22:11:30.260865 - Running put_table_items...
2021-02-28 22:11:30.265326 - Start decommission thread of run_decommission_then_add_node..
2021-02-28 22:11:30.265589 - Running run_decommission_then_add_node...
2021-02-28 22:11:30.265672 - Decommissioning node4..
2021-02-28 22:11:30.851040 - Shards updated status is:
2021-02-28 22:11:30.851150 - Existing number of shards [0] is updated to: [1024]
2021-02-28 22:11:30.851685 - Existing number of open shards [0] is updated to: [1024]
2021-02-28 22:11:31.531979 - Shards updated status is:
2021-02-28 22:11:31.532030 - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:11:31.532526 - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:11:34.274969 - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:11:52.845465 - Adding new node to cluster..
2021-02-28 22:11:56.441125 - Found 15212 records for 1024 shard iterators.
2021-02-28 22:11:56.989906 - Records diff is:
2021-02-28 22:11:56.990005 - iterable_item_added: 996
2021-02-28 22:12:29.731076 - Shards updated status is:
2021-02-28 22:12:29.731142 - Existing number of shards [1024] is updated to: [2048]
2021-02-28 22:12:29.731875 - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:12:32.507168 - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:12:55.672117 - Found 0 records for 1024 shard iterators.
2021-02-28 22:12:55.735712 - Test failed with errors: [(<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'AssertionError'>, AssertionError('No open shards records found'), <traceback object at 0x7f8cc66051c0>))]
As a comparison validation check - the issue is not reproduced when running the same test without the background decommission thread:
2021-02-28 22:38:34.657438 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 1
2021-02-28 22:39:01.084446 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 2
2021-02-28 22:40:03.062533 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 3
2021-02-28 22:41:05.445600 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 4
2021-02-28 22:42:07.675809 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Pre setup - Creating "user_table" table via "node1" node with "{'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}}" stream key
2021-02-28 22:42:07.873418 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Creating a new table 'user_table' using node 'node1'..
2021-02-28 22:42:09.467958 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - The table 'user_table' successfully created..
2021-02-28 22:42:09.471575 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Table's schema and configuration are: {'Table': {'AttributeDefinitions': [{'AttributeName': 'hash_key', 'AttributeType': 'S'}], 'TableName': 'user_table', 'KeySchema': [{'AttributeName': 'hash_key', 'KeyType': 'HASH'}], 'TableStatus': 'ACTIVE', 'CreationDateTime': datetime.datetime(2021, 2, 28, 22, 42, 9, tzinfo=tzlocal()), 'TableArn': 'arn:scylla:alternator:alternator_user_table:scylla:table/user_table', 'TableId': '6c814b50-7a05-11eb-96a3-b9b283df594d', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST', 'LastUpdateToPayPerRequestDateTime': datetime.datetime(2021, 2, 28, 22, 42, 9, tzinfo=tzlocal())}, 'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}, 'LatestStreamLabel': '"2021-02-28T22:42:07Z"', 'LatestStreamArn': 'S6c834720-7a05-11eb-96a3-b9b283df594d'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '634', 'content-type': 'application/json', 'date': 'Sun, 28 Feb 2021 20:42:09 GMT', 'server': 'Seastar httpd'}, 'RetryAttempts': 0}}
2021-02-28 22:42:09.471882 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new 100 items to table 'user_table'..
2021-02-28 22:42:09.521999 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-28 22:42:09.537825 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Start Alternator stress of put_table_items..
Using parameters of: {'table_name': 'user_table', 'node': <ccmlib.scylla_node.ScyllaNode object at 0x7f230203fb50>, 'num_of_items': 1000, 'verbose': False}
2021-02-28 22:42:09.538091 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Running put_table_items...
2021-02-28 22:42:10.027383 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:42:10.027471 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [0] is updated to: [1024]
2021-02-28 22:42:10.027745 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [0] is updated to: [1024]
2021-02-28 22:42:10.655643 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:42:10.655712 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:42:10.656377 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:42:13.513601 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:42:35.614614 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 26032 records for 1024 shard iterators.
2021-02-28 22:42:36.288177 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:42:36.288290 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 1000
2021-02-28 22:43:07.416283 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:43:07.416910 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:43:07.419720 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:43:10.234157 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:43:44.411341 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 39040 records for 1024 shard iterators.
2021-02-28 22:43:46.314524 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:43:46.314624 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 142
2021-02-28 22:44:16.667640 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:44:16.667731 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:44:16.668326 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:44:19.554951 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:44:52.128172 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 25090 records for 1024 shard iterators.
2021-02-28 22:44:53.557562 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:44:53.557629 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 126
2021-02-28 22:44:53.557663 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 2
2021-02-28 22:45:23.951322 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:45:23.951387 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:45:23.952206 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:45:26.932139 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:46:16.035860 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 52832 records for 1024 shard iterators.
2021-02-28 22:46:18.370925 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:46:18.370995 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 17
2021-02-28 22:46:18.371032 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 67
2021-02-28 22:46:48.794302 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:46:48.794361 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:46:48.795071 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:46:51.822724 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:47:59.030950 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 56824 records for 1024 shard iterators.
2021-02-28 22:48:01.879236 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:48:01.879312 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 68
2021-02-28 22:48:01.882410 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 3
2021-02-28 22:48:32.377957 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:48:32.378040 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:48:32.378848 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:48:35.215964 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:49:14.534186 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 31260 records for 1024 shard iterators.
2021-02-28 22:49:16.710381 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:49:16.710461 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 3
2021-02-28 22:49:16.710504 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 45
2021-02-28 22:49:47.184573 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:49:47.185068 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:49:47.185831 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:49:49.818557 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:50:34.829131 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 37663 records for 1024 shard iterators.
2021-02-28 22:50:36.677670 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:50:36.681053 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 45
2021-02-28 22:51:07.139664 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:51:07.139717 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:51:07.140604 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:51:10.049777 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:52:04.786331 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 53015 records for 1024 shard iterators.
2021-02-28 22:52:07.038115 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:52:07.038958 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 45
2021-02-28 22:52:37.534385 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:52:37.534484 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:52:37.535195 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:52:40.227533 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:53:24.571288 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 45339 records for 1024 shard iterators.
2021-02-28 22:53:27.635824 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:53:27.635894 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 45
2021-02-28 22:53:58.093358 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 22:53:58.093408 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 22:53:58.094448 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 22:54:01.123506 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 22:54:56.846935 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 68625 records for 1024 shard iterators.
2021-02-28 22:54:59.733642 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 22:54:59.733701 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 3
2021-02-28 22:55:29.826478 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Stopping cluster
2021-02-28 22:56:24.757699 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - stopping ccm cluster test at: /home/yarongilor/.dtest/dtest-2vfds6ri
2021-02-28 22:56:24.829742 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Freeing cluster ID 39: link /home/yarongilor/.dtest/39
on another run got a read timeout exception of Streams trying to read CDC table:
2021-02-28 23:35:18.981871 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - cluster ccm directory: /home/yarongilor/.dtest/dtest-3sy6s1g9
2021-02-28 23:35:18.981973 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting Scylla cluster version master_2021-02-15T11_11_32Z
test_updated_shards_during_add_decommission_node (alternator_stream_tests.AlternatorStreamsTest) ... 2021-02-28 23:35:19.008124 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Allocated cluster ID 34: /home/yarongilor/.dtest/dtest-3sy6s1g9
2021-02-28 23:35:19.089650 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Scylla mode is 'release'
2021-02-28 23:35:19.089737 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Cluster *_request_timeout_in_ms=10000, range_request_timeout_in_ms=30000, cql request_timeout=30
2021-02-28 23:35:19.112288 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 1
2021-02-28 23:35:55.552312 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 2
2021-02-28 23:36:57.787020 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 3
2021-02-28 23:38:00.381024 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting node 4
2021-02-28 23:39:02.954116 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Pre setup - Creating "user_table" table via "node1" node with "{'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}}" stream key
2021-02-28 23:39:03.035801 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Creating a new table 'user_table' using node 'node1'..
2021-02-28 23:39:05.855278 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - The table 'user_table' successfully created..
2021-02-28 23:39:05.858662 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Table's schema and configuration are: {'Table': {'AttributeDefinitions': [{'AttributeName': 'hash_key', 'AttributeType': 'S'}], 'TableName': 'user_table', 'KeySchema': [{'AttributeName': 'hash_key', 'KeyType': 'HASH'}], 'TableStatus': 'ACTIVE', 'CreationDateTime': datetime.datetime(2021, 2, 28, 23, 39, 5, tzinfo=tzlocal()), 'TableArn': 'arn:scylla:alternator:alternator_user_table:scylla:table/user_table', 'TableId': '6019d000-7a0d-11eb-89a3-c38ffacd3ea6', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST', 'LastUpdateToPayPerRequestDateTime': datetime.datetime(2021, 2, 28, 23, 39, 5, tzinfo=tzlocal())}, 'StreamSpecification': {'StreamEnabled': True, 'StreamViewType': 'KEYS_ONLY'}, 'LatestStreamLabel': '"2021-02-28T23:39:03Z"', 'LatestStreamArn': 'S603c7330-7a0d-11eb-89a3-c38ffacd3ea6'}, 'ResponseMetadata': {'HTTPStatusCode': 200, 'HTTPHeaders': {'content-length': '634', 'content-type': 'application/json', 'date': 'Sun, 28 Feb 2021 21:39:05 GMT', 'server': 'Seastar httpd'}, 'RetryAttempts': 0}}
2021-02-28 23:39:05.859767 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new 100 items to table 'user_table'..
2021-02-28 23:39:05.947565 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - trying get_stream_arn [1/60] (The stream ARN of 'user_table' table not found)
2021-02-28 23:39:05.965397 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Start Alternator stress of put_table_items..
Using parameters of: {'table_name': 'user_table', 'node': <ccmlib.scylla_node.ScyllaNode object at 0x7f58c7ed5d30>, 'num_of_items': 1000, 'verbose': False}
2021-02-28 23:39:05.966150 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Running put_table_items...
2021-02-28 23:39:05.979404 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Start decommission thread of run_decommission_then_add_node..
2021-02-28 23:39:05.979953 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Running run_decommission_then_add_node...
2021-02-28 23:39:05.980104 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node4..
2021-02-28 23:39:06.486024 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:39:06.486388 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [0] is updated to: [1024]
2021-02-28 23:39:06.487496 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [0] is updated to: [1024]
2021-02-28 23:39:06.487623 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #1..
2021-02-28 23:39:07.150922 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:39:07.151082 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [1024]
2021-02-28 23:39:07.153275 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:39:12.061712 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:39:27.970388 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:39:38.743387 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 19384 records for 2048 shard iterators.
2021-02-28 23:39:40.547916 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:39:40.548158 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 1000
2021-02-28 23:40:10.577976 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #2..
2021-02-28 23:40:11.678732 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:40:11.679013 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [1024] is updated to: [2048]
2021-02-28 23:40:11.693952 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:40:22.260551 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:40:33.028726 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:40:38.040240 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node5..
2021-02-28 23:40:44.368490 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 4882 records for 3072 shard iterators.
2021-02-28 23:40:46.421692 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:40:46.421884 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 139
2021-02-28 23:41:06.153941 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:41:16.451749 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #3..
2021-02-28 23:41:17.350494 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:41:17.350691 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [2048] is updated to: [2048]
2021-02-28 23:41:17.356670 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:41:22.445334 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:42:12.347512 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:42:17.361723 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node6..
2021-02-28 23:42:25.899223 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 49106 records for 2048 shard iterators.
2021-02-28 23:42:30.233951 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:42:30.234143 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 139
2021-02-28 23:42:43.404626 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:43:00.268331 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #4..
2021-02-28 23:43:04.654017 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:43:04.654134 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [2048] is updated to: [3072]
2021-02-28 23:43:04.659551 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:43:09.391361 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:44:01.182274 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:44:06.205544 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node7..
2021-02-28 23:44:13.097143 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 37920 records for 2048 shard iterators.
2021-02-28 23:44:19.066895 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:44:19.067013 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 3
2021-02-28 23:44:41.554486 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:44:49.083946 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #5..
2021-02-28 23:44:57.273254 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:44:57.273380 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [3072] is updated to: [4096]
2021-02-28 23:44:57.283666 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:45:02.187139 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:45:51.297199 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:45:56.302194 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node8..
2021-02-28 23:46:23.336680 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 40546 records for 2048 shard iterators.
2021-02-28 23:46:26.214606 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:46:29.565062 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:46:29.565188 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 3
2021-02-28 23:46:59.594960 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #6..
2021-02-28 23:47:01.887296 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:47:01.887970 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [4096] is updated to: [6144]
2021-02-28 23:47:01.898940 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:47:06.377222 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:47:46.237578 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 6 records for 3072 shard iterators.
2021-02-28 23:47:50.533481 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:47:50.533633 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_removed: 997
2021-02-28 23:47:55.558881 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:48:00.568826 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node9..
2021-02-28 23:48:20.563822 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #7..
2021-02-28 23:48:52.337358 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:48:52.337489 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [6144] is updated to: [6144]
2021-02-28 23:48:52.350800 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:48:57.148758 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:48:59.669771 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Adding new node to cluster..
2021-02-28 23:50:19.479974 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Node successfully added!
2021-02-28 23:50:24.484702 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Decommissioning node9..
2021-02-28 23:50:59.958509 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 32596 records for 2048 shard iterators.
2021-02-28 23:51:01.659345 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Records diff is:
2021-02-28 23:51:01.659591 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - iterable_item_added: 997
2021-02-28 23:51:31.689832 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Starting cycle #8..
2021-02-28 23:51:36.203591 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Shards updated status is:
2021-02-28 23:51:36.203970 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of shards [6144] is updated to: [7168]
2021-02-28 23:51:36.238262 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Existing number of open shards [1024] is updated to: [1024]
2021-02-28 23:51:38.935224 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Found 1024 shard iterators for 1024 shards.
2021-02-28 23:52:44.332810 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Test failed with errors: [(<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'botocore.errorfactory.InternalServerError'>, InternalServerError('An error occurred (InternalServerError) when calling the GetRecords operation (reached max retries: 4): Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_user_table.user_table_scylla_cdc_log - received only 1 responses from 2 CL=LOCAL_QUORUM.)'), <traceback object at 0x7f58920b1140>))]
2021-02-28 23:52:44.333235 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Stopping cluster
2021-03-01 00:00:16.769839 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Test failed with errors: [(<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'botocore.errorfactory.InternalServerError'>, InternalServerError('An error occurred (InternalServerError) when calling the GetRecords operation (reached max retries: 4): Internal server error: exceptions::read_timeout_exception (Operation timed out for alternator_user_table.user_table_scylla_cdc_log - received only 1 responses from 2 CL=LOCAL_QUORUM.)'), <traceback object at 0x7f58920b1140>)), (<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'ccmlib.node.NodeError'>, NodeError('Problem stopping node node1'), <traceback object at 0x7f5893ebf980>)), (<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'ccmlib.node.NodetoolError'>, NodetoolError("Nodetool command '/home/yarongilor/.ccm/scylla-repository/master_2021-02-15T11_11_32Z/scylla-tools-java/bin/nodetool -h 127.0.34.9 -p 7934 decommission' failed; exit status: 1; stdout: nodetool: Scylla API server HTTP POST to URL '/storage_service/decommission' failed: std::runtime_error (local node is not a member of the token ring yet)\nSee 'nodetool help' or 'nodetool help <command>'.\n"), <traceback object at 0x7f5893ebff40>)), (<alternator_stream_tests.AlternatorStreamsTest testMethod=test_updated_shards_during_add_decommission_node>, (<class 'botocore.exceptions.EndpointConnectionError'>, EndpointConnectionError('Could not connect to the endpoint URL: "http://127.0.34.1:8080/"'), <traceback object at 0x7f5893ebf3c0>))]
2021-03-01 00:00:16.770248 alternator_stream_tests.py:AlternatorStreamsTest.test_updated_shards_during_add_decommission_node - Stopping cluster
Why would you a.) be sure to even get data in the new, open shards? This is the new CDC generation, and could quite possibly be wholly empty at this point in time, depending on cluster propagation, and the rate of data insertion in base table. b.) Assume that data from previous read of open shards would be equal to reading from a new set? First of, if data is being added to base table, you would get new records. Second, if we change topology, we get a new set of open shards, whose data is only as old as the topology change.
Remember: topology changes cause CDC stream id (shards in alternator) to be re-created, new. But all old data (created before the change) will remain with their old shards. Nothing moves. It is just that new data will go into the new partitioning.
Why would you a.) be sure to even get data in the new, open shards? This is the new CDC generation, and could quite possibly be wholly empty at this point in time, depending on cluster propagation, and the rate of data insertion in base table. b.) Assume that data from previous read of open shards would be equal to reading from a new set? First of, if data is being added to base table, you would get new records. Second, if we change topology, we get a new set of open shards, whose data is only as old as the topology change.
Remember: topology changes cause CDC stream id (shards in alternator) to be re-created, new. But all old data (created before the change) will remain with their old shards. Nothing moves. It is just that new data will go into the new partitioning.
@elcallio , a) because a background stress thread inserts data continuously. and new data is assumed to only be found in open shards. would it be possible that all open shards have no records at all (although writes are inserted all the time). b) on the contrary - the assumption is data from previous read is different. and again, it is not asserted, only printed to log. the expected result is getting new different records after topology change. so that it confirms Streams are appropriately handling topology changes and not getting stuck somehow.
@fruch , i think we have to consider integrating KCL to Dtest for these reproduction efforts. since current reproduced issues in Dtest are a bit different than the original KCL one.
It is absolutely possible to have empty (no data in it) shard set. Esp. since there is an overlap window where data writes can go to either old shards or new shards. First of, I can't see anything in the test verifying that a shard change has actually happened between diff:ing prev and current. You could very well be looking at the same set of shards. It is correct that if the shard set is different, the data is different, but the check does not include the shards themselves, only the records.
rerunning the SCT test, looks like there are 100 shards were all are closed and there's no open shard:
yarongilor@yaron-pc:/$ aws dynamodbstreams describe-stream --stream S0a16ddd0-7b46-11eb-af34-e4a0ee242883 --endpoint-url http://34.243.174.71:8080 | grep '"ShardId":' -c
100
yarongilor@yaron-pc:/$ aws dynamodbstreams describe-stream --stream S0a16ddd0-7b46-11eb-af34-e4a0ee242883 --endpoint-url http://34.243.174.71:8080 | grep EndingSequenceNumber -c
100
yarongilor@yaron-pc:/$ aws dynamodbstreams describe-stream --stream S0a16ddd0-7b46-11eb-af34-e4a0ee242883 --endpoint-url http://34.243.174.71:8080 | grep StartingSequenceNumber -c
100
@elcallio , @nyh , @fruch , the cluster is alive and can be accesses for any debugging you think of. describe_stream_S0a16ddd0-7b46-11eb-af34-e4a0ee242883.txt
You can't do just "describe-stream", since it will give back max 100 entries (look above!). Like all dynamo operations, it is paged and you need to repeat the query with ExclusiveStartShardId
set to whatever the last query gave back. It is tedious as f***, which is why you typically script it.
You can't do just "describe-stream", since it will give back max 100 entries (look above!). Like all dynamo operations, it is paged and you need to repeat the query with
ExclusiveStartShardId
set to whatever the last query gave back. It is tedious as f***, which is why you typically script it.
right, i was just about writing a small script for it.. and indeed it proves there are some open shards. they are found eventually after going over more than 200,000 shards:
Found 100000 StartingSequenceNumber and 100000 EndingSequenceNumber
Found 100000 StartingSequenceNumber and 100000 EndingSequenceNumber
Found 25792 StartingSequenceNumber and 4288 EndingSequenceNumber
Open shards found!
the monitor can be seen here: http://54.195.131.244:3000/d/Irdtk9sGk/longevity-alternator-streams-with-nemesis-scylla-per-server-metrics-nemesis-master?orgId=1&refresh=30s
Following that, it also looks like the original problem is not reproduced with the latest master version tested here. the grafana alternator metrics do show API calls for shard iterators and records. but on the other hand, from some reason the destination table is reported to be zero size all along the run by nodetool cfstats:
20:33:18 < t:2021-03-02 18:33:13,707 f:kcl_thread.py l:139 c:sdcm.kcl_thread p:INFO > == CompareTablesSizesThread: dst table/src table number of partitions: 0/37727884 ==
@yarongilor according to the hydra kcl log, seem like it's still happening (even that the dashboard isn't showing exactly the thing we've seen, and in this case getShardIterator
is being called again and again, but you can see that getRecords
isn't)
2021-03-02 10:57:13.915 [main] INFO StreamsAdapterDemo - Sleeping for 32400sec
2021-03-02 10:57:34.685 [ForkJoinPool-1-worker-14] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H177f27a7273:af10000000000000ca731e6f78000271
2021-03-02 10:57:34.827 [ForkJoinPool-1-worker-29] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H177f27a7273:1ba8000000000000e863de061c000931
2021-03-02 10:57:34.969 [ForkJoinPool-1-worker-30] WARN c.a.s.k.c.lib.worker.ShutdownTask - Dropped lease for shutting down ShardConsumer: H177f27a7273:88d5b6db6db6db6e94dd675fb4000081
I think in this case the decommission took place straight as the KCL started, and it failed to copy anything into the target table.
Also one of you loader is killed by spot termination, so I don't know if we have more infro from this cluster, and you terminate it.
That log is a bit weird. It suggests that the lease for a shard is being dropped even though it is not "closed", i.e. we've reached end-of-data (GetRecords returned no next iterator) but the mechanism cannot see the shard as "closed". My guess is that this is because not all obsolete shards (in alternator) are parents - and this is the criteria kinesis uses, not the more appropriate "EndSequenceNumber". I am unsure if this could be an issue. Since we're end-of-data anyway it should not, but...
The worrying part is that this will cause kinesis to not update the stream position for the shard. Which again, should not affect anything, but...
Update: I think I've managed to reproduce at least one broken behaviour. Seems there is an issue with (new) stream ids, timestamps and our bookkeeping of whether we've reached shard end or not. Investigating/trying to fix. Basically, our test condition seems to fail, and we end up saying open shards are actually ended.
KCL table dump to file after Stream got stuck.
cqlsh:alternator_streams-adapter-demo> SELECT count(*) from "streams-adapter-demo" ;
count
-------
43008
alternator_streams-adapter-demo.streams-adapter-demo.log.gz alternator_streams-adapter-demo.streams-adapter-demo.txt
KCL table dump to file after Stream got stuck.
cqlsh:alternator_streams-adapter-demo> SELECT count(*) from "streams-adapter-demo" ; count ------- 43008
alternator_streams-adapter-demo.streams-adapter-demo.log.gz alternator_streams-adapter-demo.streams-adapter-demo.txt
This is after one decommission and adding one node ? This number correlate to the numbers of shared, this is quite a lot.
@slivne had a suggestion today to reproduce it with one vnode per node, and smaller machine to try it out with much less shards. to try figuring if with fewer shards it work
Here are my notes of trying to reproduce it manually (it depends on have YCSB / hydra-kcl and latest scylla-ccm) and on each step pushing bigger and bigger amount of keys with YCSB
# omitting --vnodes here, would make it work with num_tokens=1
./ccm create --scylla --vnodes --version branch_4.4.rc1 -n 3 8012_repo
./ccm updateconf alternator_write_isolation:always_use_lwt alternator_port:8080 experimental:true
./ccm start
# wait until cluster up
# in run hydra-kcl
./gradlew run --args='-e http://127.0.0.1:8080 -t usertable --timeout 32400'
# run YCSB
python2.7 ./bin/ycsb load dynamodb -P workloads/workloada -P dynamodb.properties -p recordcount=100 -p insertorder=uniform -p insertcount=100 -p fieldcount=2 -p fieldlength=50 -s
# decomission
./ccm node3 nodetool decommission
# push bigger data set
python2.7 ./bin/ycsb load dynamodb -P workloads/workloada -P dynamodb.properties -p recordcount=200 -p insertorder=uniform -p insertcount=200 -p fieldcount=2 -p fieldlength=50 -s
./ccm add -scylla -b node4
./ccm node4 start
# push bigger data set
python2.7 ./bin/ycsb load dynamodb -P workloads/workloada -P dynamodb.properties -p recordcount=300 -p insertorder=uniform -p insertcount=300 -p fieldcount=2 -p fieldlength=50 -s
When I'll have more time to sit down on it, I might be able to pack it all into a dtest, but we'll followed up on @yarongilor reproduction run with more debug and with ShardSyncStrategyType.PERIODIC
(as suggested here https://github.com/fruch/hydra-kcl/pull/5)
issue is reproduced where running a single decommission only + using a smaller db instance type. thus there are less SMPs and less Stream shards.
Installation details
Kernel version: 5.11.6-1.el7.elrepo.x86_64
Scylla version (or git commit hash): 4.5.dev-0.20210319.abab1d906c with build-id c60f1de6f6a94a2b9a6219edd82727f66aaf8c83
Cluster size: 6 nodes (i3.large)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0c8a8490cf80a53a0
(aws: eu-west-1)
Test: longevity-alternator-streams-with-nemesis
Test name: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Issue description
====================================
PUT ISSUE DESCRIPTION HERE
====================================
Restore Monitor Stack command: $ hydra investigate show-monitor a7453f7c-eddb-4777-9cfa-c7931bbcc2f5
Show all stored logs command: $ hydra investigate show-logs a7453f7c-eddb-4777-9cfa-c7931bbcc2f5
Test id: a7453f7c-eddb-4777-9cfa-c7931bbcc2f5
Logs: grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-alternator-20210321_102430-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-longevity-alternator-streams-with-nemesis-scylla-per-server-metrics-nemesis-20210321_102155-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png grafana - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_101815/grafana-screenshot-overview-20210321_101815-alternator-streams-nemesis-stream-w-monitor-node-a7453f7c-1.png db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/db-cluster-a7453f7c.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/loader-set-a7453f7c.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/monitor-set-a7453f7c.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/a7453f7c-eddb-4777-9cfa-c7931bbcc2f5/20210321_150530/sct-runner-a7453f7c.zip
Installation details Scylla version (or git commit hash):
4.4.dev-0.20210104.d5da455d9 with build-id 56ab4a808a56049c86c5af7a0f4a680dc37f36d1
Cluster size: 6 nodes (i3.4xlarge) OS (RHEL/CentOS/Ubuntu/AWS AMI):ami-018ec229a4231314c
(aws: eu-west-1)Test:
longevity-alternator-streams-with-nemesis
Test name:longevity_test.LongevityTest.test_custom_time
Test config file(s):Issue description
====================================
scenario:
==> result: YCSB reported verification errors as below.
====================================
Restore Monitor Stack command:
$ hydra investigate show-monitor b71161a4-182a-4447-8a35-9e9c63e89b16
Show all stored logs command:$ hydra investigate show-logs b71161a4-182a-4447-8a35-9e9c63e89b16
Test id:
b71161a4-182a-4447-8a35-9e9c63e89b16
Logs: db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/db-cluster-b71161a4.zip loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/loader-set-b71161a4.zip monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/monitor-set-b71161a4.zip sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/b71161a4-182a-4447-8a35-9e9c63e89b16/20210105_233028/sct-runner-b71161a4.zip
Jenkins job URL
YCSB verification error:
the YCSB insert and read commands as found on SCT yaml file are:
some YCSB write temporary connection issues noticed during test, probably due to decommissions, but overall it reported successful run:
the second YCSB thread running the second range part of inserts, reported successful writing as well:
the issue was also reproduced on another run, using ChaosMonkey of multiple nemesis instead of Decommission nemesis: