Open ShlomiBalalis opened 1 year ago
@ShlomiBalalis - where can I find the manager log, so we can see what was restored?
Out of curiosity, why is it using LCS?
2023-08-14T14:03:55+00:00 longevity-200gb-48h-verify-limited--db-node-84dfb4de-1 !INFO | scylla[13934]: [shard 12] LeveledManifest - Leveled compaction strategy is restoring invariant of level 1 by compacting 2 sstables on behalf of keyspace1.standard1
@ShlomiBalalis - where can I find the manager log, so we can see what was restored?
the server is in the monitor tarball, the agents are in the db nodes
Out of curiosity, why is it using LCS?
2023-08-14T14:03:55+00:00 longevity-200gb-48h-verify-limited--db-node-84dfb4de-1 !INFO | scylla[13934]: [shard 12] LeveledManifest - Leveled compaction strategy is restoring invariant of level 1 by compacting 2 sstables on behalf of keyspace1.standard1
This is simply part of the longevity scenario, but this is not the problematic keyspace anyway
So the logs show that some data has actually been downloaded and loaded to the cluster. The problem is that both automatic and manual repair (still present in this test scenario) didn't repair restored table.
So right now I'm checking if it's a restore or repair problem (tested version of SM does not contain repair refactor, so this is not connected to those changes).
Leading theory: I tried to restore the keyspace manually the old fasioned: downloading the sstables and refreshing, but we noticed something funny: At first, I was trying to query the keyspace right after the restore, consistently failing:
cassandra@cqlsh:5gb_sizetiered_2022_1> select * from standard1;
NoHostAvailable:
Then, I tried to change the replication factor of the keyspace, and noticed that while the region of the cluster under test is eu-west
:
$ nodetool status
Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.4.3.146 52.54 GB 256 ? c37bdb3d-7a3b-477a-b4fb-a4a98684a2c5 1a
UN 10.4.0.236 49.08 GB 256 ? 5d5bd234-9aec-4146-b3cb-b8e2e1729fa4 1a
UN 10.4.0.248 44.61 GB 256 ? 2747eaee-4803-490a-ad78-03467dd1f7cc 1a
UN 10.4.0.171 44.33 GB 256 ? 1434be9a-d258-4dd2-9579-2cec850786c1 1a
The keyspace was set to replicate in us-east, which is probably the region of the originally backed up cluster:
cassandra@cqlsh> SELECT * FROM system_schema.keyspaces;
keyspace_name | durable_writes | replication
-------------------------------+----------------+-------------------------------------------------------------------------------------
system_auth | True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'eu-west': '4'}
system_schema | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
keyspace1 | True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'eu-west': '3'}
system_distributed | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '3'}
system | True | {'class': 'org.apache.cassandra.locator.LocalStrategy'}
audit | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}
system_traces | True | {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '2'}
system_distributed_everywhere | True | {'class': 'org.apache.cassandra.locator.EverywhereStrategy'}
5gb_sizetiered_2022_1 | True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '3'}
Once I altered the region of the keyspace's region, I was able to query it just fine:
cassandra@cqlsh> ALTER KEYSPACE "5gb_sizetiered_2022_1" WITH replication = {'class': 'NetworkTopologyStrategy', 'eu-west': '1'};
cassandra@cqlsh> use "5gb_sizetiered_2022_1";
cassandra@cqlsh:5gb_sizetiered_2022_1> select * from standard1;
key | C0 | C1 | C10 | C11 | C12 | C13 | C14 | C15 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | C9
------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------
0x343831364b4b33324e30 | 0x88cfad6776a64370624fca6a579b22909dd7d10500537a183446f9f429dea1d13fcb3716aba68f5023a974a1f6fef5fd3e3eb2856c8a08a38fc019de9fe45b90 | 0x68a108ecfcca46efb65d38269f75d23a1f43456a1cbe033baefc0cd2f3edbfa4289874dc57ae085e9cd830ae0644351a3c32c6d49140f81e714d715f2324cb75 | 0xa0a206e6889d81d48edda04842b35a248f3a608bd4588619ca39176f64ca53238913a404fba7ec3e67c071b35e13e2a39773610a5b541dc7a8f32cadc7c7eedf | 0xb7a13a1ad602380b4577dc5bb64865e54922862cf670bf288d3fb9afd69091477c623e9255e1d81068bd0707e01c0680cc306cb3693be8688c0db1c948ea38c3 | 0x7e1119d0cb8f34cd7141ba1ec7cf71eb64f254b0d46fc0f78b31fa3c1fe336eae57412dbaad94d4c728ca51140438d5e2521587f657d7dcfffdeeeb1218b2357 | 0x5dfd6b7923a1025f0085ecf43516aec54c25ced79dc5217267c060ddc927de711b0eec16116eeb2380f184bb1a7d6f9482bcdd1f4c75d7c4cacd42950746e4fa | 0x1ddb930018f516dc7e3ffafddc7ac358df4f3d2352931ae31982c55cc7e0d7dc9ea6067de7218a9e61f735f69ab3eb1ceb3b27e6300deee70c6c455cd20e6a14 | 0x6812d3acf717ba682498373953e77d64792930f029e5ab2b2c4a477098f9b49f0e6d35615e9e65b7736ee992ab3ff027227c73595e71f355b6b89e1ab1c7fb9d | 0x51ae72d57ce76acc6c69c90713f7d4fb9261efcfc833e73b30e383e70eb56aea4e11c2b51053e7479142041df5bd832fd6417e835a851378433e0de71bbdaee0 | 0xa9fa0041d3270b15f1700778bea29b99a7ea7c2172e338157ca41593f99e3a04a6a649c698bc01b888f6038b8740678554f41de84a3fb66390d300328068d204 | 0xec0375be958914ee5a7797c6921dfa0b309d95cf98fc9dd846dbfcc982d2ad0da27a7d17f7b1ff6c6fcca1c816fc47f5b96af5a50e1c28ae9e31351d250b6aab | 0x405d6d52c7782e3b8271a809e4138ece48bb4c0c203c65368008e778c23d1c2fe2a8105b89cf2141ddbb9090b1f69192af21afeba81c05d70880179a6300b745 | 0x3e4ed6b0621aa8ebfdf0035d417727357ef13ccc7e20bb8489f00dce99ce5b3690ccf2ec7759a4f0d5134fa3ac0471dad663a1a934cfc3cafe621f39dcf9c112 | 0xd046c4ed74a6821b7739342f48419f07b1a0d69175c239ccc0a504ddd0c440f02f233c9a898e2d59a3111479e0166cb4b7745b1322f9fddefd3bb197f8c60a34 | 0x553fff6a14fc45ba273ae9549962324615e90d9d79933eb7121eb5741c1c773da5503824f4d8b6584a4407cabbd5d6862f7eeb1bb76690fd14442834a45afd7f | 0xbb9b3a6ac0acfe1e67bb87870b45be7f9119439c691d39c8b69412c4ec8b513bcf6b4965af98e61711cb7da504b252cd716ce29a0d8772c3f1b89d349c467f8f
...
So, the difference in regions is probably the cause of the failures.
So, restore only works in the same region, and there is a procedure to restore to a different region? This is acceptable, but it needs to be explicitly documented.
Restoring tables has a requirement of having identical schema as in the backup. The dcs are also a part of the keyspace schema. So the fact that that restore does not work when you try to restore data into empty dc seems logical.
The strange part here is that load&stream does not complain when it has to upload sstables to nodes from empty dc (we can add manual checks for that in SM). I would suspect, that in this scenario uploaded sstables should be lost, as they don't belong to any node in the cluster, but maybe L&S still stores it somewhere, even though it's impossible to query the data because of the "unavailable replicas" error.
In your example you said that you used nodetool refresh
for uploading sstables, but did you use it with the -las
option?
I'm curious if a work around in this case should look like:
But at least we know that this issue is not a regression and that IMHO restore works as described in the docs.
Restoring tables has a requirement of having identical schema as in the backup. The dcs are also a part of the keyspace schema. So the fact that that restore does not work when you try to restore data into empty dc seems logical.
The strange part here is that load&stream does not complain when it has to upload sstables to nodes from empty dc (we can add manual checks for that in SM). I would suspect, that in this scenario uploaded sstables should be lost, as they don't belong to any node in the cluster, but maybe L&S still stores it somewhere, even though it's impossible to query the data because of the "unavailable replicas" error.
In your example you said that you used
nodetool refresh
for uploading sstables, but did you use it with the-las
option?
Nope. a simple nodetool refresh -- 5gb_sizetiered_2022_1 standard1
I'm curious if a work around in this case should look like:
* restore schema * (change replication of keyspace with non-existing dc - but do we have a guarantee that the restore tables will work when using different keyspace schema?) * restore tables * (or maybe here is the right place for changing keyspace replication - perhaps uploaded data is still stored somewhere in the cluster and now it is safe to alter keyspace)
In my case, I first loaded the data with refresh and only then altered the keyspace, and everything seemed fine afterwards (of course, it was only a preliminary check that the table contains data at all)
Nope. a simple nodetool refresh -- 5gb_sizetiered_2022_1 standard1
That's strange because nodetool refresh docs says:
Scylla node will ignore the partitions in the sstables which are not assigned to this node. For example, if sstable are copied from a different node.
So I would expect that it worked partially / it's not reliable to use it in this way. So the approach with:
seems more promising. @asias, do you think that this approach is safe and should work?
Context: We have a backup from some cluster with only dc1. We want to restore it to a different cluster with only dc2. Normally, SM would first restore all schema from the backup (this requires cluster restart) and then it would proceed with restoring non-schema SSTables via load&stream. The problem is that we restore SSTables into keyspace replicated only in dc1 and we don't have any nodes from this dc in restore destination cluster, so even though restore procedure ends "successfully", the data is not there. Is it safe to use load&steam on SSTables when backed-up and restore destination clusters have identical table schema, but have different keyspace schema (keyspace name is the same, but there are different dc names in replication strategies)?
Nope. a simple nodetool refresh -- 5gb_sizetiered_2022_1 standard1
That's strange because nodetool refresh docs says:
Scylla node will ignore the partitions in the sstables which are not assigned to this node. For example, if sstable are copied from a different node.
So I would expect that it worked partially / it's not reliable to use it in this way. So the approach with:
- restore schema
- alter restored keyspace replication strategy (change dc names)
- restore data
Yeah, regardless of the fact that it worked (and I agree, it's strange it worked at all) this is probably the correct course of action as far as I can tell
My local experiments confirms that the approach:
works fine, but they are just experiments and not proofs of reliability.
@ShlomiBalalis could we rerun this test scenario with the additional alter keyspace
step in the middle of both restores?
@ShlomiBalalis ping
@ShlomiBalalis ?
My local experiments confirms that the approach:
- restore schema
- alter restored keyspace replication strategy (change dc names)
- restore data
works fine, but they are just experiments and not proofs of reliability. @ShlomiBalalis could we rerun this test scenario with the additional
alter keyspace
step in the middle of both restores?
@Mark-Gurevich can you please take over this? If needed, let's open an issue in SCT to add this as workaround until this issue is fixed.
@Michal-Leszczynski mind taking ownership of this issue?
IIUC we need to add to the disrupt_mgmt_restore
nemesis code additional alter keyspace
in middle of both restores?
From a brief view of the code I didn't find where this can be added. Needs further deep dive.
@mikliapko is this something that you could take care of? I mean validating that procedure described in https://github.com/scylladb/scylla-manager/issues/3525#issuecomment-1693310241 works fine with some proper test. When it's validated, we can add it to SM docs.
@mikliapko is this something that you could take care of? I mean validating that procedure described in #3525 (comment) works fine with some proper test. When it's validated, we can add it to SM docs.
Yep, as it's still happening, I will take a look into it
@mikliapko is this something that you could take care of? I mean validating that procedure described in #3525 (comment) works fine with some proper test. When it's validated, we can add it to SM docs.
Yep, as it's still happening, I will take a look into it
@mikliapko it's happening in a test that disable raft topology, is the schema restore depended on raft topology ?
Scylla version: 6.3.0~dev-20240927.c17d35371846
with build-id a9b08d0ce1f3cf99eb39d7a8372848fa2840dc1d
Kernel Version: 6.8.0-1016-aws
Cluster size: 5 nodes (i4i.8xlarge)
Scylla Nodes used in this run:
OS / Image: ami-087d814d9b6773015
(aws: undefined_region)
Test: longevity-mv-si-4days-streaming-test
Test id: 34c4d009-73b1-490b-83e5-03f6705be5eb
Test name: scylla-master/tier1/longevity-mv-si-4days-streaming-test
Test method: longevity_test.LongevityTest.test_custom_time
Test config file(s):
Starting from SM 3.3 and Scylla 6.0, SM restores schema by applying the output of DESC SCHEMA WITH INTERNALS
.
The problem is the keyspace definition contains dc names - that's why this test fails with the following error:
"M":"Run ended with ERROR","task":"restore/09af96b8-68b1-4bf6-928b-7fd01aa266f4","status":"ERROR","cause":"restore data: create \"100gb_sizetiered_6_0\" (\"100gb_sizetiered_6_0\") with CREATE KEYSPACE \"100gb_sizetiered_6_0\" WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '3'} AND durable_writes = true: Unrecognized strategy option {us-east} passed to org.apache.cassandra.locator.NetworkTopologyStrategy for keyspace 100gb_sizetiered_6_0","duration":"5.618998928s"
So right now this is a documented limitation, but we should make it possible to restore schema into a different DC setting or make it easier for the user to modify just the DC part of keyspace schema.
Created an issue for that: #4049.
Issue description
At
2023-08-14 13:27:09,663
, we started two restore tasks that uses a pre-created snapshot, that includes the keyspace5gb_sizetiered_2022_1
. First, a task to restore the schema:The restore task has ended successfully:
At which point, restart all of the nodes (' services) in the cluster, one by one:
Afterwards, werestore the data:
Which also passed:
Afterwards, we also created a general repair task (since this code was not adjusted to the autmatic repair just yet):
Which passed:
Then, We executed a cassandra-stress to validate the data, which was DOA:
Looking into the data folders in the machines as well, it seems that they are completely empty:
Impact
Describe the impact this issue causes to the user.
How frequently does it reproduce?
Describe the frequency with how this issue can be reproduced.
Installation details
Kernel Version: 5.15.0-1040-aws Scylla version (or git commit hash):
2022.2.12-20230727.f4448d5b0265
with build-ida87bfeb65d24abf65d074a3ba2e5b9664692d716
Cluster size: 4 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-0624755b4db06e567
(aws: eu-west-1)Test:
longevity-200gb-48h-test_restore-nemesis
Test id:84dfb4de-0573-4a01-8806-8b832bcafd91
Test name:scylla-staging/Shlomo/longevity-200gb-48h-test_restore-nemesis
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 84dfb4de-0573-4a01-8806-8b832bcafd91` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=84dfb4de-0573-4a01-8806-8b832bcafd91) - Show all stored logs command: `$ hydra investigate show-logs 84dfb4de-0573-4a01-8806-8b832bcafd91` ## Logs: - **db-cluster-84dfb4de.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/db-cluster-84dfb4de.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/db-cluster-84dfb4de.tar.gz) - **sct-runner-events-84dfb4de.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/sct-runner-events-84dfb4de.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/sct-runner-events-84dfb4de.tar.gz) - **sct-84dfb4de.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/sct-84dfb4de.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/sct-84dfb4de.log.tar.gz) - **loader-set-84dfb4de.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/loader-set-84dfb4de.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/loader-set-84dfb4de.tar.gz) - **monitor-set-84dfb4de.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/monitor-set-84dfb4de.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/84dfb4de-0573-4a01-8806-8b832bcafd91/20230814_140710/monitor-set-84dfb4de.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/scylla-staging/job/Shlomo/job/longevity-200gb-48h-test_restore-nemesis/16/) [Argus](https://argus.scylladb.com/test/226c0f08-de6f-4d69-8f77-b01161019748/runs?additionalRuns[]=84dfb4de-0573-4a01-8806-8b832bcafd91)