scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
55 stars 93 forks source link

Scylla failed to start after some sstable files deleted: init - Startup failed: std::runtime_error .. (error system:2, filesystem error: stat failed: No such file or directory #5431

Open yarongilor opened 1 year ago

yarongilor commented 1 year ago

Installation details

Kernel Version: 5.15.0-1019-aws Scylla version (or git commit hash): 2022.2.0~rc2-20220919.75d087a2b75a with build-id 463f1a57b82041a6c6b6441f0cbc26c8ad93091e Relocatable Package: http://downloads.scylladb.com/downloads/scylla-enterprise/relocatable/scylladb-2022.2/scylla-enterprise-x86_64-package-2022.2.0-rc2.0.20220919.75d087a2b75a.tar.gz Cluster size: 5 nodes (i3.4xlarge)

Scylla Nodes used in this run:

OS / Image: ami-00bd31f22bcf5ae1a (aws: eu-west-1)

Test: ics-longevity-mv-si-4days-test Test id: 4a622274-af57-417f-a1ec-4cc4c89af60e Test name: enterprise-2022.2/SCT_Enterprise_Features/ICS/ics-longevity-mv-si-4days-test Test config file(s):

Issue description

>>>>>>> Scenario:

  1. Run a configuration of MV + SI.
  2. Run a disrupt_rebuild_streaming_err nemesis:
  3. stop node-13
  4. delete some sstable files.
  5. start node-13
  6. node got the following errors:
    < t:2022-10-01 12:50:15,509 f:nemesis.py      l:1005 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Set current_disruption -> RebuildStreamingErr Node longevity-mv-si-4d-2022-2-db-node-4a622274-13 [54.75.69.155 | 10.4.3.195] (seed: False)
    < t:2022-10-01 12:50:15,514 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:INFO  > 2022-10-01 12:50:15.509: (DisruptionEvent Severity.NORMAL) period_type=begin event_id=f394ffa3-701c-475a-9ed0-104f96cbf862: nemesis_name=RebuildStreamingErr target_node=Node longevity-mv-si-4d-2022-2-db-node-4a622274-13 [54.75.69.155 | 10.4.3.195] (seed: False)
    < t:2022-10-01 12:51:39,450 f:nemesis.py      l:986  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Files /var/lib/scylla/data/mview/users_by_email-25f851803f1411ed91e552472854bf91/me-3254-* were destroyed
    < t:2022-10-01 12:51:40,000 f:nemesis.py      l:986  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Files /var/lib/scylla/data/mview/users_by_initials-da5d43803f6711ed800e8e21fe1dd5a1/me-3245-* were destroyed
    < t:2022-10-01 12:51:40,511 f:nemesis.py      l:986  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Files /var/lib/scylla/data/mview/users_by_password-27135e703f1411ed8c10fc81479e6e02/me-3247-* were destroyed
    < t:2022-10-01 12:51:41,064 f:nemesis.py      l:986  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Files /var/lib/scylla/data/mview/users_by_address-d98f6c303f6711ed876201c995b8b238/me-3430-* were destroyed
    < t:2022-10-01 12:51:41,575 f:nemesis.py      l:986  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: Files /var/lib/scylla/data/sec_index/users_last_access_ind_index-244d2e613f6811ed800e8e21fe1dd5a1/me-3682-* were destroyed
    < t:2022-10-01 12:52:49,420 f:db_log_reader.py l:113  c:sdcm.db_log_reader   p:DEBUG > 2022-10-01T12:52:49+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13      !ERR | scylla[115686]:  [shard  0] database - Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db])
    < t:2022-10-01 12:52:49,423 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 2022-10-01 12:52:49.421 <2022-10-01 12:52:49.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=43db8d81-4ac4-4650-9b6b-25d32dd4c26b: type=FILESYSTEM_ERROR regex=filesystem_error line_number=2967121 node=longevity-mv-si-4d-2022-2-db-node-4a622274-13
    < t:2022-10-01 12:52:49,423 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 2022-10-01T12:52:49+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13      !ERR | scylla[115686]:  [shard  0] database - Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db])
    < t:2022-10-01 13:03:48,806 f:db_log_reader.py l:113  c:sdcm.db_log_reader   p:DEBUG > 2022-10-01T13:03:48+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13      !ERR | scylla[115686]:  [shard  0] init - Startup failed: std::runtime_error (Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db]))
    < t:2022-10-01 13:03:48,808 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 2022-10-01T13:03:48+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13      !ERR | scylla[115686]:  [shard  0] init - Startup failed: std::runtime_error (Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db]))
    < t:2022-10-01 13:05:18,081 f:db_log_reader.py l:113  c:sdcm.db_log_reader   p:DEBUG > 2022-10-01T13:05:18+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13      !ERR | scylla[116163]:  [shard  0] database - Exception while populating keyspace 'system_schema' with column family 'aggregates' from file '/var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895/snapshots/sm_20220930165632UTC/me-1007076-big-CompressionInfo.db])
    < t:2022-10-01 13:05:18,083 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 2022-10-01 13:05:18.082 <2022-10-01 13:05:18.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=43db8d81-4ac4-4650-9b6b-25d32dd4c26b: type=FILESYSTEM_ERROR regex=filesystem_error line_number=2996033 node=longevity-mv-si-4d-2022-2-db-node-4a622274-13
    < t:2022-10-01 13:05:18,083 f:file_logger.py  l:101  c:sdcm.sct_events.file_logger p:ERROR > 2022-10-01T13:05:18+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13      !ERR | scylla[116163]:  [shard  0] database - Exception while populating keyspace 'system_schema' with column family 'aggregates' from file '/var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895/snapshots/sm_20220930165632UTC/me-1007076-big-CompressionInfo.db])
    < t:2022-10-01 13:05:23,785 f:nemesis.py      l:3691 c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.SisyphusMonkey: RebuildStreamingErr Node longevity-mv-si-4d-2022-2-db-node-4a622274-13 [54.75.69.155 | 10.4.3.195] (seed: False) duration -> 908 s

2022-10-01 12:52:49.421 <2022-10-01 12:52:49.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=43db8d81-4ac4-4650-9b6b-25d32dd4c26b: type=FILESYSTEM_ERROR regex=filesystem_error line_number=2967121 node=longevity-mv-si-4d-2022-2-db-node-4a622274-13 2022-10-01T12:52:49+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13 !ERR | scylla[115686]: [shard 0] database - Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db])

2022-10-01 13:03:48.806 <2022-10-01 13:03:48.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=afd54525-70ca-4030-9014-5e3687075929: type=RUNTIME_ERROR regex=std::runtime_error line_number=2992360 node=longevity-mv-si-4d-2022-2-db-node-4a622274-13 2022-10-01T13:03:48+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13 !ERR | scylla[115686]: [shard 0] init - Startup failed: std::runtime_error (Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db]))

2022-10-01 13:05:18.082 <2022-10-01 13:05:18.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=43db8d81-4ac4-4650-9b6b-25d32dd4c26b: type=FILESYSTEM_ERROR regex=filesystem_error line_number=2996033 node=longevity-mv-si-4d-2022-2-db-node-4a622274-13 2022-10-01T13:05:18+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13 !ERR | scylla[116163]: [shard 0] database - Exception while populating keyspace 'system_schema' with column family 'aggregates' from file '/var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895/snapshots/sm_20220930165632UTC/me-1007076-big-CompressionInfo.db])


**<<<<<<<**

- Restore Monitor Stack command: `$ hydra investigate show-monitor 4a622274-af57-417f-a1ec-4cc4c89af60e`
- Restore monitor on AWS instance using [Jenkins job](https://jenkins.scylladb.com/view/QA/job/QA-tools/job/hydra-show-monitor/parambuild/?test_id=4a622274-af57-417f-a1ec-4cc4c89af60e)
- Show all stored logs command: `$ hydra investigate show-logs 4a622274-af57-417f-a1ec-4cc4c89af60e`

## Logs:
- **db-cluster-4a622274.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/db-cluster-4a622274.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/db-cluster-4a622274.tar.gz)
- **monitor-set-4a622274.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/monitor-set-4a622274.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/monitor-set-4a622274.tar.gz)
- **loader-set-4a622274.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/loader-set-4a622274.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/loader-set-4a622274.tar.gz)
- **normal-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/normal-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/normal-4a622274.log.tar.gz)
- **summary-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/summary-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/summary-4a622274.log.tar.gz)
- **events-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/events-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/events-4a622274.log.tar.gz)
- **output-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/output-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/output-4a622274.log.tar.gz)
- **debug-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/debug-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/debug-4a622274.log.tar.gz)
- **sct-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/sct-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/sct-4a622274.log.tar.gz)
- **error-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/error-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/error-4a622274.log.tar.gz)
- **critical-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/critical-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/critical-4a622274.log.tar.gz)
- **raw_events-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/raw_events-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/raw_events-4a622274.log.tar.gz)
- **warning-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/warning-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/warning-4a622274.log.tar.gz)
- **email_data-4a622274.json.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/email_data-4a622274.json.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/email_data-4a622274.json.tar.gz)
- **argus-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/argus-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/argus-4a622274.log.tar.gz)
- **left_processes-4a622274.log.tar.gz** - [https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/left_processes-4a622274.log.tar.gz](https://cloudius-jenkins-test.s3.amazonaws.com/4a622274-af57-417f-a1ec-4cc4c89af60e/20221001_132903/left_processes-4a622274.log.tar.gz)

[Jenkins job URL](https://jenkins.scylladb.com/job/enterprise-2022.2/job/SCT_Enterprise_Features/job/ICS/job/ics-longevity-mv-si-4days-test/2/)
DoronArazii commented 1 year ago

@raphaelsc can you please have a look

roydahan commented 1 year ago

This issue left without attention.

I would like @yarongilor to run a reproducer and verify this is not an issue with the test that somehow deleted sstables in a wrong manner when dealing with ICS.

yarongilor commented 1 year ago

@roydahan, issue is not reproduced after many executions of this nemesis (~100).

fgelcer commented 1 year ago

@roydahan, issue is not reproduced after many executions of this nemesis (~100).

with RC4, right? in the issue, we removed (AFAIU) MV files... did we have such in the reproducer above? also, please attach logs/links to the reproducer here

yarongilor commented 1 year ago

@fgelcer , the logs show mv / si tables were selected many times for this nemesis:

< t:2022-11-12 08:52:27,096 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-12705-* were destroyed
< t:2022-11-12 08:52:27,692 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-12421-* were destroyed
< t:2022-11-12 09:44:05,965 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-13939-* were destroyed
< t:2022-11-12 09:44:06,561 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-13843-* were destroyed
< t:2022-11-12 09:44:07,410 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-13533-* were destroyed
< t:2022-11-12 09:44:07,961 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-13951-* were destroyed
< t:2022-11-12 09:44:08,557 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-13539-* were destroyed
< t:2022-11-12 10:36:58,744 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-12664-* were destroyed
< t:2022-11-12 10:36:59,340 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_first_name_ind_index-0dcde6a0610211ed971449e56488479b/me-12740-* were destroyed
< t:2022-11-12 10:36:59,936 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-12567-* were destroyed
< t:2022-11-12 10:37:00,532 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-12996-* were destroyed
< t:2022-11-12 10:37:01,128 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-13034-* were destroyed
< t:2022-11-12 11:28:52,332 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-15929-* were destroyed
< t:2022-11-12 11:28:52,928 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-15913-* were destroyed
< t:2022-11-12 11:28:53,524 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_address_ind_index-e511fd01615111ed971449e56488479b/me-11768-* were destroyed
< t:2022-11-12 11:28:54,120 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-15887-* were destroyed
< t:2022-11-12 11:28:54,716 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_address-8dadb450615111ed9377bd809198c114/me-12070-* were destroyed
< t:2022-11-12 12:26:19,764 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-14214-* were destroyed
< t:2022-11-12 12:26:20,360 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users-08ca54e0610211ed867e2074936ab04d/me-36218-* were destroyed
< t:2022-11-12 12:26:20,956 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-14217-* were destroyed
< t:2022-11-12 12:26:21,556 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users-f78fee60610111ed867e2074936ab04d/me-33130-* were destroyed
< t:2022-11-12 12:26:22,152 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_initials-8ebb7ad0615111edbee238a385d7b02f/me-13685-* were destroyed
< t:2022-11-12 13:18:52,636 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-13923-* were destroyed
< t:2022-11-12 13:18:53,233 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-13615-* were destroyed
< t:2022-11-12 13:18:53,829 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_access_ind_index-e5df3810615111edbee238a385d7b02f/me-11149-* were destroyed
< t:2022-11-12 13:18:54,424 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-13937-* were destroyed
< t:2022-11-12 13:18:55,020 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_address-8dadb450615111ed9377bd809198c114/me-11814-* were destroyed
< t:2022-11-12 14:11:17,122 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_email-fceb1600610111ed9ccfb66c0d5dc24c/me-15831-* were destroyed
< t:2022-11-12 14:11:17,718 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_initials-8ebb7ad0615111edbee238a385d7b02f/me-14732-* were destroyed
< t:2022-11-12 14:11:18,314 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_first_name_ind_index-0dcde6a0610211ed971449e56488479b/me-15428-* were destroyed
< t:2022-11-12 14:11:18,910 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_first_name_ind_index-0dcde6a0610211ed971449e56488479b/me-15390-* were destroyed
< t:2022-11-12 14:11:19,507 f:nemesis.py      l:988  c:sdcm.nemesis         p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-15948-* were destroyed

Installation details

Kernel Version: 5.15.0-1022-aws Scylla version (or git commit hash): 2022.2.0~rc4-20221106.f5714e0db12f with build-id f4a927b8a00fbcd8d48640835192aeaa7968b1f2 Relocatable Package: http://downloads.scylladb.com/unstable/scylla-enterprise/enterprise-2022.2/relocatable/2022-11-06T15:44:05Z/scylla-enterprise-x86_64-package.tar.gz Cluster size: 5 nodes (i3.4xlarge)

Scylla Nodes used in this run:

OS / Image: ami-00253925d2028dcef (aws: eu-west-1)

Test: ics-longevity-mv-si-4days-test_RebuildStreamError-only Test id: dcd1091a-e979-4539-b4f2-e48aded3c599 Test name: enterprise-2022.2/longevity/Reproducers/ics-longevity-mv-si-4days-test_RebuildStreamError-only Test config file(s):

Issue description

>>>>>>> Your description here... <<<<<<<

Logs:

Jenkins job URL

yarongilor commented 1 year ago

The missing directories specified in scylla error are related to manager snapshots, like snapshots/sm_20220930165632UTC:

2022-10-01T12:52:49+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13     
 !ERR | scylla[115686]:  [shard  0] database - Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': 
std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed:
 No such file or directory
 [/var/lib/scylla/data/system_schema/view_virtual_columns-
08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db])

before this nemesis ran, a nemesis of manager-backup ran twice and failed for a manager snapshot timeout issue of https://github.com/scylladb/scylla-manager/issues/3389

@roydahan , we can consider running a reproducer, using both manager-backup and Streaming-error nemesis.

@karol-kokoszka , can you please advise - could scylladb/scylla-manager#3389 be related to this issue?

roydahan commented 1 year ago

No need, this is a test issue.

roydahan commented 1 year ago

@yarongilor take a similar cluster, take a snapshot using a manager, then run the script of the sstable deletion and check what it deletes (when the node is stopped)

fruch commented 1 year ago

I can't find the issue with the discussion.

But we did have this before when manager is deciding to purge snapshot at the exact time we are restating scylla service.

From one end, there isn't an API to delete snapshots

On the other hand, Scylla boot doesn't really need to touch the snapshots, not clear why those files are read at boot.

@bhalevy maybe you remember this issue ?

fruch commented 1 year ago

@roydahan @bhalevy

I've found the issue, we encounter it in scylla-cloud: https://github.com/scylladb/scylla-enterprise/issues/2072

they solved it by suspending the manager tasks when the are restarting scylla.

so it's not exactly solved.

bhalevy commented 1 year ago

We can skip mode validation for the snapshots directory

fruch commented 1 year ago

We can skip mode validation for the snapshots directory

can you help us get it on some of teams backlog ?