Open yarongilor opened 1 year ago
@raphaelsc can you please have a look
This issue left without attention.
I would like @yarongilor to run a reproducer and verify this is not an issue with the test that somehow deleted sstables in a wrong manner when dealing with ICS.
@roydahan, issue is not reproduced after many executions of this nemesis (~100).
@roydahan, issue is not reproduced after many executions of this nemesis (~100).
with RC4, right? in the issue, we removed (AFAIU) MV files... did we have such in the reproducer above? also, please attach logs/links to the reproducer here
@fgelcer , the logs show mv / si tables were selected many times for this nemesis:
< t:2022-11-12 08:52:27,096 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-12705-* were destroyed
< t:2022-11-12 08:52:27,692 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-12421-* were destroyed
< t:2022-11-12 09:44:05,965 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-13939-* were destroyed
< t:2022-11-12 09:44:06,561 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-13843-* were destroyed
< t:2022-11-12 09:44:07,410 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-13533-* were destroyed
< t:2022-11-12 09:44:07,961 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-13951-* were destroyed
< t:2022-11-12 09:44:08,557 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-13539-* were destroyed
< t:2022-11-12 10:36:58,744 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-12664-* were destroyed
< t:2022-11-12 10:36:59,340 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_first_name_ind_index-0dcde6a0610211ed971449e56488479b/me-12740-* were destroyed
< t:2022-11-12 10:36:59,936 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-12567-* were destroyed
< t:2022-11-12 10:37:00,532 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-12996-* were destroyed
< t:2022-11-12 10:37:01,128 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-13034-* were destroyed
< t:2022-11-12 11:28:52,332 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-15929-* were destroyed
< t:2022-11-12 11:28:52,928 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-15913-* were destroyed
< t:2022-11-12 11:28:53,524 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_address_ind_index-e511fd01615111ed971449e56488479b/me-11768-* were destroyed
< t:2022-11-12 11:28:54,120 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-15887-* were destroyed
< t:2022-11-12 11:28:54,716 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_address-8dadb450615111ed9377bd809198c114/me-12070-* were destroyed
< t:2022-11-12 12:26:19,764 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-14214-* were destroyed
< t:2022-11-12 12:26:20,360 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users-08ca54e0610211ed867e2074936ab04d/me-36218-* were destroyed
< t:2022-11-12 12:26:20,956 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-14217-* were destroyed
< t:2022-11-12 12:26:21,556 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users-f78fee60610111ed867e2074936ab04d/me-33130-* were destroyed
< t:2022-11-12 12:26:22,152 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_initials-8ebb7ad0615111edbee238a385d7b02f/me-13685-* were destroyed
< t:2022-11-12 13:18:52,636 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-13923-* were destroyed
< t:2022-11-12 13:18:53,233 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_initials_ind_index-0e95f191610211ed80c99f0624507779/me-13615-* were destroyed
< t:2022-11-12 13:18:53,829 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_access_ind_index-e5df3810615111edbee238a385d7b02f/me-11149-* were destroyed
< t:2022-11-12 13:18:54,424 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_last_name_ind_index-0ca62f80610211ed9ccfb66c0d5dc24c/me-13937-* were destroyed
< t:2022-11-12 13:18:55,020 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_address-8dadb450615111ed9377bd809198c114/me-11814-* were destroyed
< t:2022-11-12 14:11:17,122 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_email-fceb1600610111ed9ccfb66c0d5dc24c/me-15831-* were destroyed
< t:2022-11-12 14:11:17,718 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_initials-8ebb7ad0615111edbee238a385d7b02f/me-14732-* were destroyed
< t:2022-11-12 14:11:18,314 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_first_name_ind_index-0dcde6a0610211ed971449e56488479b/me-15428-* were destroyed
< t:2022-11-12 14:11:18,910 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/sec_index/users_first_name_ind_index-0dcde6a0610211ed971449e56488479b/me-15390-* were destroyed
< t:2022-11-12 14:11:19,507 f:nemesis.py l:988 c:sdcm.nemesis p:DEBUG > sdcm.nemesis.RebuildStreamingErrMonkey: Files /var/lib/scylla/data/mview/users_by_first_name-fb6bc900610111ed971449e56488479b/me-15948-* were destroyed
Kernel Version: 5.15.0-1022-aws
Scylla version (or git commit hash): 2022.2.0~rc4-20221106.f5714e0db12f
with build-id f4a927b8a00fbcd8d48640835192aeaa7968b1f2
Relocatable Package: http://downloads.scylladb.com/unstable/scylla-enterprise/enterprise-2022.2/relocatable/2022-11-06T15:44:05Z/scylla-enterprise-x86_64-package.tar.gz
Cluster size: 5 nodes (i3.4xlarge)
Scylla Nodes used in this run:
OS / Image: ami-00253925d2028dcef
(aws: eu-west-1)
Test: ics-longevity-mv-si-4days-test_RebuildStreamError-only
Test id: dcd1091a-e979-4539-b4f2-e48aded3c599
Test name: enterprise-2022.2/longevity/Reproducers/ics-longevity-mv-si-4days-test_RebuildStreamError-only
Test config file(s):
>>>>>>> Your description here... <<<<<<<
$ hydra investigate show-monitor dcd1091a-e979-4539-b4f2-e48aded3c599
$ hydra investigate show-logs dcd1091a-e979-4539-b4f2-e48aded3c599
The missing directories specified in scylla error are related to manager snapshots, like snapshots/sm_20220930165632UTC
:
2022-10-01T12:52:49+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13
!ERR | scylla[115686]: [shard 0] database - Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa':
std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed:
No such file or directory
[/var/lib/scylla/data/system_schema/view_virtual_columns-
08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db])
before this nemesis ran, a nemesis of manager-backup ran twice and failed for a manager snapshot timeout issue of https://github.com/scylladb/scylla-manager/issues/3389
@roydahan , we can consider running a reproducer, using both manager-backup and Streaming-error nemesis.
@karol-kokoszka , can you please advise - could scylladb/scylla-manager#3389 be related to this issue?
No need, this is a test issue.
@yarongilor take a similar cluster, take a snapshot using a manager, then run the script of the sstable deletion and check what it deletes (when the node is stopped)
I can't find the issue with the discussion.
But we did have this before when manager is deciding to purge snapshot at the exact time we are restating scylla service.
From one end, there isn't an API to delete snapshots
On the other hand, Scylla boot doesn't really need to touch the snapshots, not clear why those files are read at boot.
@bhalevy maybe you remember this issue ?
@roydahan @bhalevy
I've found the issue, we encounter it in scylla-cloud: https://github.com/scylladb/scylla-enterprise/issues/2072
they solved it by suspending the manager tasks when the are restarting scylla.
so it's not exactly solved.
We can skip mode validation for the snapshots directory
We can skip mode validation for the snapshots directory
can you help us get it on some of teams backlog ?
Installation details
Kernel Version: 5.15.0-1019-aws Scylla version (or git commit hash):
2022.2.0~rc2-20220919.75d087a2b75a
with build-id463f1a57b82041a6c6b6441f0cbc26c8ad93091e
Relocatable Package: http://downloads.scylladb.com/downloads/scylla-enterprise/relocatable/scylladb-2022.2/scylla-enterprise-x86_64-package-2022.2.0-rc2.0.20220919.75d087a2b75a.tar.gz Cluster size: 5 nodes (i3.4xlarge)Scylla Nodes used in this run:
OS / Image:
ami-00bd31f22bcf5ae1a
(aws: eu-west-1)Test:
ics-longevity-mv-si-4days-test
Test id:4a622274-af57-417f-a1ec-4cc4c89af60e
Test name:enterprise-2022.2/SCT_Enterprise_Features/ICS/ics-longevity-mv-si-4days-test
Test config file(s):Issue description
>>>>>>> Scenario:
2022-10-01 12:52:49.421 <2022-10-01 12:52:49.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=43db8d81-4ac4-4650-9b6b-25d32dd4c26b: type=FILESYSTEM_ERROR regex=filesystem_error line_number=2967121 node=longevity-mv-si-4d-2022-2-db-node-4a622274-13 2022-10-01T12:52:49+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13 !ERR | scylla[115686]: [shard 0] database - Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db])
2022-10-01 13:03:48.806 <2022-10-01 13:03:48.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=afd54525-70ca-4030-9014-5e3687075929: type=RUNTIME_ERROR regex=std::runtime_error line_number=2992360 node=longevity-mv-si-4d-2022-2-db-node-4a622274-13 2022-10-01T13:03:48+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13 !ERR | scylla[115686]: [shard 0] init - Startup failed: std::runtime_error (Exception while populating keyspace 'system_schema' with column family 'view_virtual_columns' from file '/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/snapshots/sm_20220930165632UTC/me-982982-big-Summary.db]))
2022-10-01 13:05:18.082 <2022-10-01 13:05:18.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=43db8d81-4ac4-4650-9b6b-25d32dd4c26b: type=FILESYSTEM_ERROR regex=filesystem_error line_number=2996033 node=longevity-mv-si-4d-2022-2-db-node-4a622274-13 2022-10-01T13:05:18+00:00 longevity-mv-si-4d-2022-2-db-node-4a622274-13 !ERR | scylla[116163]: [shard 0] database - Exception while populating keyspace 'system_schema' with column family 'aggregates' from file '/var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895': std::filesystem::__cxx11::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory [/var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895/snapshots/sm_20220930165632UTC/me-1007076-big-CompressionInfo.db])