Open juliayakovlev opened 8 months ago
Found https://github.com/scylladb/scylladb/issues/16321. Not sure if it the same / similar
I don't think that's a similar issue. Mentioned issue had a problem with restoring schema multiple times on the same cluster which is not supported. I haven't seen this in this issue.
It looks like file me-138-big-Index.db
is present in SM manifest, but it's missing in backup location and that causes restore to fail.
From SM logs it looks like the test scenario goes like this:
But the strange thing is that backup generates snapshot tag sm_20240112221504UTC
, but both restores use snapshot tag sm_20230702235739UTC
. Is this expected? Where does the snapshot tag used for restore comes from and is there a chance that this backup is broken (misses s3:manager-backup-tests-permanent-snapshots-us-east-1/backup/sst/cluster/0f0f556f-eb17-4012-b39c-f99a35828c04/dc/us-east/node/15430605-a376-4758-9205-014ab34ad5d5/keyspace/100gb_sizetiered_2022_2/table/standard1/07206f60192311eea6af23bef1a3e064/me-138-big-Index.db
)?
I validated that this file is indeed missing from the s3 dir, so it's either a problem with a test (using predefined backup instead of the fresh one) or just a problem with predefined backup that's not part of the test. @juliayakovlev can we close this issue?
@ShlomiBalalis can you see that, please
@juliayakovlev , @ShlomiBalalis - any updates?
@juliayakovlev , @ShlomiBalalis - any updates?
@ShlomiBalalis can you advice, please?
Hi! Sorry for the long silence Yes, the file is missing, but I can't say for certain if it was missing in the first place, ever since we created the backup, or somewhere down the road. There is no Lifecycle rule that would cause this file to be deleted, so if it was properly created in the first place, I don't know how it went missing. I'll try to find the logs of the original run to see if it will be of any help
I validated that this file is indeed missing from the s3 dir, so it's either a problem with a test (using predefined backup instead of the fresh one) or just a problem with predefined backup that's not part of the test. @juliayakovlev can we close this issue?
The file was created over six months ago as part of another test run. Would that be a problem?
The file was created over six months ago as part of another test run. Would that be a problem?
SM should have no problem with restoring old backups.
@ShlomiBalalis any news? It continues to fail.
@ShlomiBalalis ping
@mikliapko is this something that you could take care of? I mean validating if this is a problem with some incomplete, cached backup or is it an actual issue.
Issue description
MgmtRestore nemesis failed with error:
Client version:
3.2.5-0.20231206.8b378dea
Server version:3.2.5-0.20231206.8b378dea
Impact
sctool restore
failed. No other impact observesHow frequently does it reproduce?
Found this issue. Not sure if it the same / similar
Installation details
Kernel Version: 5.15.0-1051-aws Scylla version (or git commit hash):
2023.1.4-20240112.12c616e7f0cf
with build-ide7263a4aa92cf866b98cf680bd68d7198c9690c0
Cluster size: 4 nodes (i3en.2xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-08b5f8ff1565ab9f0
(aws: undefined_region)Test:
longevity-twcs-48h-test
Test id:54645511-775e-4d02-8fd8-35a38a4a2df8
Test name:enterprise-2023.1/longevity/longevity-twcs-48h-test
Test config file(s):Logs and commands
- Restore Monitor Stack command: `$ hydra investigate show-monitor 54645511-775e-4d02-8fd8-35a38a4a2df8` - Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=54645511-775e-4d02-8fd8-35a38a4a2df8) - Show all stored logs command: `$ hydra investigate show-logs 54645511-775e-4d02-8fd8-35a38a4a2df8` ## Logs: - **db-cluster-54645511.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/db-cluster-54645511.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/db-cluster-54645511.tar.gz) - **sct-runner-events-54645511.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/sct-runner-events-54645511.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/sct-runner-events-54645511.tar.gz) - **sct-54645511.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/sct-54645511.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/sct-54645511.log.tar.gz) - **loader-set-54645511.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/loader-set-54645511.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/loader-set-54645511.tar.gz) - **monitor-set-54645511.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/monitor-set-54645511.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/54645511-775e-4d02-8fd8-35a38a4a2df8/20240113_101418/monitor-set-54645511.tar.gz) [Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2023.1/job/longevity/job/longevity-twcs-48h-test/7/) [Argus](https://argus.scylladb.com/test/3c965e5e-a758-4f96-9a5d-2ad5a58921bb/runs?additionalRuns[]=54645511-775e-4d02-8fd8-35a38a4a2df8)