[scylla-manager-agent download-files]: don't download system_schema tables from the backup

vladzcloudius commented 2 years ago

Version: 2.6

Description According to a --dry-run output an agent is downloading not only user tables but also system_xxx keyspaces as well and in particular system_schema:

$ sudo -u scylla scylla-manager-agent download-files -L gcs:supporteng-bucket -n 2d8045e7-51a4-42c4-bb8f-5e8aa77a8228 -T sm_20220131211012UTC -d /var/lib/scylla/data/ --dry-run
{"L":"ERROR","T":"2022-02-03T00:49:04.088Z","N":"rclone","M":"parse instance region: invalid character '<' looking for beginning of value","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/go-log@v0.0.6/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/go-log@v0.0.6/logger.go:84\ngithub.com/scylladb/scylla-manager/pkg/rclone.RedirectLogPrint.func1\n\tgithub.com/scylladb/scylla-manager/pkg/rclone/logger.go:19\ngithub.com/rclone/rclone/fs.LogPrintf\n\tgithub.com/rclone/rclone@v1.51.0/fs/log.go:152\ngithub.com/rclone/rclone/fs.Errorf\n\tgithub.com/rclone/rclone@v1.51.0/fs/log.go:167\ngithub.com/scylladb/scylla-manager/pkg/rclone.awsRegionFromMetadataAPI\n\tgithub.com/scylladb/scylla-manager/pkg/rclone/aws.go:42\ngithub.com/scylladb/scylla-manager/pkg/rclone.(*S3Options).AutoFill\n\tgithub.com/scylladb/scylla-manager/pkg/rclone/options.go:131\ngithub.com/scylladb/scylla-manager/pkg/rclone.RegisterS3Provider\n\tgithub.com/scylladb/scylla-manager/pkg/rclone/providers.go:52\nmain.setupCommand\n\tgithub.com/scylladb/scylla-manager/pkg/cmd/agent/setup.go:35\nmain.glob..func5\n\tgithub.com/scylladb/scylla-manager/pkg/cmd/agent/download_files.go:71\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/cobra@v1.1.1/command.go:850\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.1.1/command.go:958\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.1.1/command.go:895\nmain.main\n\tgithub.com/scylladb/scylla-manager/pkg/cmd/agent/main.go:12\nruntime.main\n\truntime/proc.go:225"}
Cluster:        XXX (299bbcb0-35ee-4cf5-a701-80976e7728f3)
Datacenter:     datacenter1
Node:           10.240.0.93 (2d8045e7-51a4-42c4-bb8f-5e8aa77a8228)
Time:           2022-01-31 21:10:12 +0000 UTC
Size:           131.102M

Download:
  - keyspace1.standard1 (130.362M) to /var/lib/scylla/data/keyspace1/standard1-dd48c390805d11ec898c000000000000
  - system_distributed.cdc_generation_descriptions (263.018k) to /var/lib/scylla/data/system_distributed/cdc_generation_descriptions-ae6534067b3e34aaaf70d5b90f837c67
  - system_distributed.cdc_streams_descriptions_v2 (157.439k) to /var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36
  - system_schema.tables (58.730k) to /var/lib/scylla/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09
  - system_schema.columns (55.316k) to /var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f
  - system_schema.scylla_tables (47.412k) to /var/lib/scylla/data/system_schema/scylla_tables-5d912ff1f7593665b2c88042ab5103dd
  - system_schema.keyspaces (45.604k) to /var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6
  - system_schema.indexes (15.429k) to /var/lib/scylla/data/system_schema/indexes-0feb57ac311f382fba6d9024d305702f
  - system_schema.dropped_columns (15.191k) to /var/lib/scylla/data/system_schema/dropped_columns-5e7583b5f3f43af19a39b7e1d6f5f11f
  - system_schema.computed_columns (14.937k) to /var/lib/scylla/data/system_schema/computed_columns-cc7c7069374033c192a4c3de78dbd2c4
  - system_schema.views (13.040k) to /var/lib/scylla/data/system_schema/views-9786ac1cdd583201a7cdad556410c985
  - system_schema.functions (10.724k) to /var/lib/scylla/data/system_schema/functions-96489b7980be3e14a70166a0b9159450
  - system_schema.aggregates (10.534k) to /var/lib/scylla/data/system_schema/aggregates-924c55872e3a345bb10c12f37c1ba895
  - system_schema.view_virtual_columns (10.370k) to /var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa
  - system_schema.types (10.323k) to /var/lib/scylla/data/system_schema/types-5a8b1ca866023f77a0459273d308917a
  - system_schema.triggers (10.204k) to /var/lib/scylla/data/system_schema/triggers-4df70b666b05325195a132b54005fd48
  - system_distributed.cdc_generation_timestamps (10.003k) to /var/lib/scylla/data/system_distributed/cdc_generation_timestamps-fdf455c4cfec3e009719d7a45436c89d
  - keyspace2.table1 (9.956k) to /var/lib/scylla/data/keyspace2/table1-a2b5dd9082d711eca54b00000000000b

And overwriting original system_schema doesn't seem to be something that is safe to do. In particular because old table's UUIDs (including system_xxx ones) are stored in system_schema.tables.

Instead the restoration procedure should include a restoration of the original schema using CQL CREATE ... commands (by simply running the output of DESC SCHEMA which is supposed to be a part of a backup).

vladzcloudius commented 2 years ago

@mmatczuk @slivne @eliransin FYI Unless I'm missing something and unless there is an easy workaround this is critical and means that backup restoration procedure described in our docs is broken since the introduction of download-files.

mmatczuk commented 2 years ago

According to @tgrabiec using the CQL statements does not work in all conditions. And system_schema shall be the last thing to snapshot. It should be perhaps restored with upload dir.

tgrabiec commented 2 years ago

According to @tgrabiec using the CQL statements does not work in all conditions.

IIRC, you're referring to the fact that restoring schema from CQL looses dropped_columns. So it only works for creating a fresh table, not for importing old data.

vladzcloudius commented 2 years ago

According to @tgrabiec using the CQL statements does not work in all conditions.

IIRC, you're referring to the fact that restoring schema from CQL looses dropped_columns. So it only works for creating a fresh table, not for importing old data.

I'm not sure I'm following, @tgrabiec. Why would we care about dropped_columns when we restore user data?

What about my comment about tables IDs? Are you restoring them on the destination node too? What about IDs on system_xx tables themselves? You kinda rely on a fact that the data in the backup is older than the one in the node but what if it's not?

Restoring a schema via CQL has always been a standard procedure for both Scylla and Cassandra from day 1. And it's written all over our and Cassandra's documentation. If there is an issue with this procedure I'd like to know: https://docs.scylladb.com/operating-scylla/procedures/backup-restore/restore/#procedure

Pushing sstables of system_schema tables was always a last-resort-hack when nothing else worked and we were always doing it only inside the same cluster.

It feels very uncomfortable that we are making it standard now.

vladzcloudius commented 2 years ago

According to @tgrabiec using the CQL statements does not work in all conditions. And system_schema shall be the last thing to snapshot. It should be perhaps restored with upload dir.

Everything should be restored via upload dir - this is our recommended way of uploading sstables into a cluster.

vladzcloudius commented 2 years ago

Ref https://github.com/scylladb/scylla-manager/issues/3020

vladzcloudius commented 2 years ago

This is the error that I get when I load all system_schema tables from the upload first and then try to load a user table's data from the upload:

TASK [Load system_schema tables data from the upload directory] ****************************************************************************************************************************************************************************************
changed: [35.196.92.133] => (item= system_schema.tables )
changed: [35.237.179.49] => (item= system_schema.keyspaces )
changed: [35.196.92.133] => (item= system_schema.columns )
changed: [35.237.179.49] => (item= system_schema.tables )
changed: [35.196.92.133] => (item= system_schema.scylla_tables )
changed: [35.237.179.49] => (item= system_schema.columns )
changed: [35.196.92.133] => (item= system_schema.keyspaces )
changed: [35.237.179.49] => (item= system_schema.scylla_tables )
changed: [35.196.92.133] => (item= system_schema.views )
changed: [35.237.179.49] => (item= system_schema.aggregates )
changed: [35.196.92.133] => (item= system_schema.functions )
changed: [35.237.179.49] => (item= system_schema.computed_columns )
changed: [35.196.92.133] => (item= system_schema.aggregates )
changed: [35.237.179.49] => (item= system_schema.dropped_columns )
changed: [35.196.92.133] => (item= system_schema.view_virtual_columns )
changed: [35.237.179.49] => (item= system_schema.functions )
changed: [35.196.92.133] => (item= system_schema.types )
changed: [35.237.179.49] => (item= system_schema.indexes )
changed: [35.196.92.133] => (item= system_schema.indexes )
changed: [35.237.179.49] => (item= system_schema.triggers )
changed: [35.196.92.133] => (item= system_schema.triggers )
changed: [35.196.92.133] => (item= system_schema.dropped_columns )
changed: [35.237.179.49] => (item= system_schema.types )
changed: [35.196.92.133] => (item= system_schema.computed_columns )
changed: [35.237.179.49] => (item= system_schema.view_virtual_columns )
changed: [35.237.179.49] => (item= system_schema.views )

TASK [Load the rest of tables data from the upload directory] ******************************************************************************************************************************************************************************************
failed: [35.196.92.133] (item= keyspace2.table2 ) => {"ansible_loop_var": "item", "changed": true, "cmd": "nodetool refresh  keyspace2 table2 \n", "delta": "0:00:00.805716", "end": "2022-02-03 14:05:18.614227", "item": " keyspace2.table2 ", "msg": "non-zero return code", "rc": 1, "start": "2022-02-03 14:05:17.808511", "stderr": "", "stderr_lines": [], "stdout": "Using /etc/scylla/scylla.yaml as the config file\nnodetool: Scylla API server HTTP POST to URL '/storage_service/sstables/keyspace2' failed: Keyspace keyspace2 Does not exist\nSee 'nodetool help' or 'nodetool help <command>'.", "stdout_lines": ["Using /etc/scylla/scylla.yaml as the config file", "nodetool: Scylla API server HTTP POST to URL '/storage_service/sstables/keyspace2' failed: Keyspace keyspace2 Does not exist", "See 'nodetool help' or 'nodetool help <command>'."]}
skipping: [35.196.92.133] => (item= system_schema.tables ) 
skipping: [35.196.92.133] => (item= system_schema.columns )

Which means that scylla doesn't pick up the new schema right away - and I'm not surprised.

@mmatczuk I believe we need to use a standard procedure when we restore schema and stay away from hacks.

I see that the backup snapshot already backs up the schema:

I couldn't find a way to fetch it using SM or SM-agent APIs however. Is there a way?

tgrabiec commented 2 years ago

According to @tgrabiec using the CQL statements does not work in all conditions.

IIRC, you're referring to the fact that restoring schema from CQL looses dropped_columns. So it only works for creating a fresh table, not for importing old data.

I'm not sure I'm following, @tgrabiec. Why would we care about dropped_columns when we restore user data?

SStable reader will fail to read sstables with unknown columns, unless they are marked as dropped in the schema.

So if you're restoring the backup into a fresh cluster, you need to restore schema tables.

If you're just rolling back to a previous snapshot, and you're fine with using the latest schema, you don't do anything with the schema tables, nor with the CQL.

tgrabiec commented 2 years ago

According to @tgrabiec using the CQL statements does not work in all conditions.

IIRC, you're referring to the fact that restoring schema from CQL looses dropped_columns. So it only works for creating a fresh table, not for importing old data.

I'm not sure I'm following, @tgrabiec. Why would we care about dropped_columns when we restore user data?

What about my comment about tables IDs? Are you restoring them on the destination node too? What about IDs on system_xx tables themselves? You kinda rely on a fact that the data in the backup is older than the one in the node but what if it's not?

I don't understand your comment about IDs.

Tables don't change IDs, ever. That includes system_xx tables

tgrabiec commented 2 years ago

Of course, you either restore schema tables, or use CQL, not both.

vladzcloudius commented 2 years ago

According to @tgrabiec using the CQL statements does not work in all conditions.

IIRC, you're referring to the fact that restoring schema from CQL looses dropped_columns. So it only works for creating a fresh table, not for importing old data.

I'm not sure I'm following, @tgrabiec. Why would we care about dropped_columns when we restore user data? What about my comment about tables IDs? Are you restoring them on the destination node too? What about IDs on system_xx tables themselves? You kinda rely on a fact that the data in the backup is older than the one in the node but what if it's not?

I don't understand your comment about IDs.

Tables don't change IDs, ever. That includes system_xx tables

When you create a fresh cluster which you will eventually upload sstables to it is going to create all kinds on system_xx/system tables and each is going to get its ID. And AFAIK - they are going to be unique and different from the ID of same tables in the source cluster.

And these IDs are going to be stored in the system_schema tables. Am I missing something, @tgrabiec ?

And if not - I hope this makes more sense now. And that's why I don't understand how uploading sstables of system_schema from one cluster into a different cluster can be seen as a safe procedure in a general case.

vladzcloudius commented 2 years ago

Hmmm... I see that IDs of all systemxx tables I checked are the same on my local cluster and on other installations.

@tgrabiec Could you, please, remind me what's the input for IDs generator for KS, CF, columns?

tgrabiec commented 2 years ago

Local (not distributed) system tables have static IDs which are calculated as name-based UUID (so depend on name only).

Distributed tables have IDs assigned during creation as new unique time-based UUID.

Keyspaces are identified by name.

Columns are identified by name.

tgrabiec commented 2 years ago

Uploading system_schema is safe if you want the same set of tables in the target cluster as in the source cluster. It's not safe if you have some other tables in the target cluster.

vladzcloudius commented 2 years ago

Thanks, Tomek. One question though:

Distributed tables have IDs assigned during creation as new unique time-based UUID.

This means that system_distributes, system_auth, system_traces and system_distributed_everywhere KSes tables are going to have different IDs as those at the source. How pushing system_schema data from the source be safe in their context, @tgrabiec?

tgrabiec commented 2 years ago

Thanks, Tomek. One question though:

Distributed tables have IDs assigned during creation as new unique time-based UUID.

This means that system_distributes, system_auth, system_traces and system_distributed_everywhere KSes tables are going to have different IDs as those at the source. How pushing system_schema data from the source be safe in their context, @tgrabiec?

How is it not safe, if the intent is to restore the whole cluster's state?

vladzcloudius commented 2 years ago

@eliransin @slivne I think this is critical

Thanks, Tomek. One question though:

Distributed tables have IDs assigned during creation as new unique time-based UUID.

This means that system_distributes, system_auth, system_traces and system_distributed_everywhere KSes tables are going to have different IDs as those at the source. How pushing system_schema data from the source be safe in their context, @tgrabiec?

How is it not safe, if the intent is to restore the whole cluster's state?

Because system_auth and other distributed system KS tables' IDs are going to be different and these IDs are going to be encoded in names of corresponding directories.

Please, read the Ansible playbook in question to see the full procedure.

In gist we start with the following (all on the destination cluster):

1) Shut the cluster down. 2) Wipe it clean and bootstrap with the same token ring as in the source cluster. At this point system_auth, system_distributed and other KSes I mentioned above are going to be created with IDs that are going to be different from those in the source cluster. 3) Upload data including system_schema which in particular has old IDs of tables above - BUM!!!

I hope it makes more sense now, @tgrabiec.

karol-kokoszka commented 1 year ago

We don't want to invest our effort into ansible script as the restore supposed to be done with manager since 3.1 release.

vladzcloudius commented 1 year ago

We don't want to invest our effort into ansible script as the restore supposed to be done with manager since 3.1 release.

Not sure how closing this issue is related to Ansible, @karol-kokoszka. Could you, please, clarify? In particular how do you plan to restore schema in SM 3.1?

Michal-Leszczynski commented 1 year ago

Could you, please, clarify? In particular how do you plan to restore schema in SM 3.1?

SM 3.1 has sctool resore --restore-schema command, which restores schema from backed-up SSTables.

It does so by uploading all backed-up SSTables from all backed-up nodes into each node in restored cluster. This creates great data duplication, but because schema SSTables are small, this shouldn't be an issue. The reason for doing this is that it also simulates repair, so every node should have correct, identical schema and we don't need to worry about some edge scenarios in terms of compaction/gc_grace_seconds.

This procedure requires user to restart the whole cluster after restore, so that nodes can pick up restored schema.

As for SSTable ID problems, when downloading schema SSTables from all backed-up nodes into given node, SM renames them (by changing their ID), so that we can avoid name conflicts. SM uses load and stream for uploading the data into the cluster, so it takes care of the rest of the problems associated with SSTable IDs.

vladzcloudius commented 1 year ago

As for SSTable ID problems, when downloading schema SSTables from all backed-up nodes into given node, SM renames them (by changing their ID), so that we can avoid name conflicts. SM uses load and stream for uploading the data into the cluster, so it takes care of the rest of the problems associated with SSTable IDs.

@karol-kokoszka You are confusing SStables IDs with table IDs This issue is all about the later and has nothing to do with the former.

The algorithm you have described doesn't solve the issue in question at all because table ID is in the content of a system_schema tables.

Please, re-read the opening message and let me know if you still need clarifications on the matter.

scylladb / scylla-manager

[scylla-manager-agent download-files]: don't download system_schema tables from the backup #3019