MV is repaired before its base table

vladzcloudius commented 11 months ago

SM version: 3.2.4~0.20231116.9d92c67a

Description Repair repairs an MV before its base table:


Every 1.0s: sctool progress -c sharechat-sc-follow-service-prod-in repair/5c23f829-e089-43ab-9b4f-81252cd18723                                              scylla-manager-sc-follow-service-sharechat-asia-1: Wed Dec  6 02:59:33 2023

Run:            42e7aa99-93e0-11ee-afec-4201ac1da092
Status:         RUNNING
Start time:     06 Dec 23 02:36:29 UTC
Duration:       23m3s
Progress:       29%
Intensity:      6/6 (max)
Parallel:       3/3 (max)
Datacenters:
  - asia-south1

╭───────────────────────────────┬────────────────────────────────┬──────────┬──────────╮
│ Keyspace                      │                          Table │ Progress │ Duration │
├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
│ audit                         │                      audit_log │ 100%     │ 22s      │
├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
│ follow_service                │          follow_by_followee_mv │ 62%      │ 2h58m10s │
│ follow_service                │                      follow_v2 │ 0%       │ 0s       │
├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
│ system_auth                   │                role_attributes │ 100%     │ 6s       │
│ system_auth                   │                   role_members │ 100%     │ 6s       │
│ system_auth                   │               role_permissions │ 100%     │ 7s       │
│ system_auth                   │                          roles │ 100%     │ 5s       │
├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
│ system_distributed_everywhere │ cdc_generation_descriptions_v2 │ 100%     │ 0s       │
├───────────────────────────────┼────────────────────────────────┼──────────┼──────────┤
│ system_distributed            │    cdc_generation_descriptions │ 100%     │ 6s       │
│ system_distributed            │      cdc_generation_timestamps │ 100%     │ 6s       │
│ system_distributed            │    cdc_streams_descriptions_v2 │ 100%     │ 6s       │
│ system_distributed            │                 service_levels │ 100%     │ 6s       │
│ system_distributed            │              view_build_status │ 100%     │ 6s       │
╰───────────────────────────────┴────────────────────────────────┴──────────┴──────────╯

AFAIR in 3.2 MVs were supposed to be repaired the last.

Schema:

CREATE KEYSPACE follow_service WITH replication = {'class': 'NetworkTopologyStrategy', 'asia-south1': '3'}  AND durable_writes = true;

CREATE TABLE follow_service.follow_v2 (
    follower double,
    followee double,
    following_since double,
    PRIMARY KEY (follower, followee)
) WITH CLUSTERING ORDER BY (followee ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'IncrementalCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

CREATE MATERIALIZED VIEW follow_service.follow_by_followee_mv AS
    SELECT *
    FROM follow_service.follow_v2
    WHERE followee IS NOT null AND following_since IS NOT null
    PRIMARY KEY (followee, following_since, follower)
    WITH CLUSTERING ORDER BY (following_since DESC, follower ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
    AND comment = ''
    AND compaction = {'class': 'IncrementalCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

vladzcloudius commented 11 months ago

Or was it the other way around? Now I'm not sure. Is the current behavior intentional? Is it the way it is in order to prevent some race during the MV update in the context of the following base table repair?

cc @asias

Michal-Leszczynski commented 11 months ago

@vladzcloudius ensuring that base table is repaired before MV is done only when SM has credentials to managed cluster. Is this the case here?

Michal-Leszczynski commented 11 months ago

In case SM has credentials, could you provide SM logs for debugging this issue?

Michal-Leszczynski commented 11 months ago

@vladzcloudius ping

vladzcloudius commented 11 months ago

@vladzcloudius ensuring that base table is repaired before MV is done only when SM has credentials to managed cluster. Is this the case here?

Hi, sorry for a delay. Yes, the CQL credentials were not set. Closing.

vladzcloudius commented 11 months ago

Ref https://github.com/scylladb/scylla-manager/issues/3680

scylladb / scylla-manager

MV is repaired before its base table #3651