scylladb / scylla-manager

The Scylla Manager
https://manager.docs.scylladb.com/stable/
Other
51 stars 33 forks source link

S3 Backups don't include Alternator keyspace #3165

Closed erikcw closed 10 months ago

erikcw commented 2 years ago

I have backups configured in the operator. I assumed the whole cluster would be backed up, but when I examine the structure on S3, it looks like just the system tables, and not my Alternator tables are bing backed up.

    backups:
    - name: "daily full cluster backup"
      #rateLimit: ["50"]
      location: ["s3:backups.*******.com"]
      interval: "1d"
    - name: "weekly full cluster backup"
      #rateLimit: ["50"]
      location: ["s3:backups.*******.com"]
      interval: "7d"

It looks like it is picking up all of my keyspaces except for alternator ( keyspace.table is: alternator_bidder-prod-matchtable.bidder-prod-matchtable -- over 1 TB of data )

Maybe it doesn’t like the - in the name?

Here's the output of the task from sctool:

+-----------------------------------+-----------------------------+----------+----------+----------+--------------+--------+------------------------+------------------------+
| Keyspace                          | Table                       | Progress | Size     | Success  | Deduplicated | Failed | Started at             | Completed at           |
+-----------------------------------+-----------------------------+----------+----------+----------+--------------+--------+------------------------+------------------------+
| system_auth                       | role_members                |     100% |       0B |       0B |           0B |     0B | 23 May 22 12:25:34 UTC | 23 May 22 12:25:34 UTC |
| system_auth                       | roles                       |     100% |     5KiB |     5KiB |         5KiB |     0B | 23 May 22 12:25:34 UTC | 23 May 22 12:25:34 UTC |
+-----------------------------------+-----------------------------+----------+----------+----------+--------------+--------+------------------------+------------------------+
| system_distributed                | cdc_streams_descriptions_v2 |     100% |   169KiB |   169KiB |       169KiB |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
| system_distributed                | view_build_status           |     100% |       0B |       0B |           0B |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
| system_distributed                | cdc_generation_descriptions |     100% |   169KiB |   169KiB |       169KiB |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
| system_distributed                | cdc_generation_timestamps   |     100% |     5KiB |     5KiB |         5KiB |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
+-----------------------------------+-----------------------------+----------+----------+----------+--------------+--------+------------------------+------------------------+
| system_traces                     | node_slow_log_time_idx      |     100% |       0B |       0B |           0B |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
| system_traces                     | sessions                    |     100% |       0B |       0B |           0B |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
| system_traces                     | node_slow_log               |     100% |       0B |       0B |           0B |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
| system_traces                     | sessions_time_idx           |     100% |       0B |       0B |           0B |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
| system_traces                     | events                      |     100% |       0B |       0B |           0B |     0B | 23 May 22 12:25:35 UTC | 23 May 22 12:25:35 UTC |
+-----------------------------------+-----------------------------+----------+----------+----------+--------------+--------+------------------------+------------------------+
| alternator_bidder-prod-matchtable | bidder-prod-matchtable      |     100% |       0B |       0B |           0B |     0B |                        |                        |
+-----------------------------------+-----------------------------+----------+----------+----------+--------------+--------+------------------------+------------------------+

Original discussion: https://scylladb-users.slack.com/archives/CFFVC7EA0/p1653147144460269

karol-kokoszka commented 1 year ago

alternator_bidder-prod-matchtable is not a valid keyspace name.

cassandra@cqlsh> CREATE KEYSPACE "alternator_bidder-prod-matchtable"  WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3};
InvalidRequest: Error from server: code=2200 [Invalid query] message=""alternator_bidder-prod-matchtable" is not a valid keyspace name"
cassandra@cqlsh> CREATE KEYSPACE "alternator_bidder_prod_matchtable"  WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3};
cassandra@cqlsh> 

Maybe it doesn’t like the - in the name?

Yes, it doesn't.

See https://opensource.docs.scylladb.com/stable/cql/ddl.html#common-definitions

nyh commented 1 year ago

So why was this issue closed if there's a real bug? (I don't know if it's a bug in scylla-manager, or something else).

Indeed, CQL doesn't allow minuses in table names, but the rules for DynamoDB table names are different - see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.NamingRules. In particular, both minus sign and dot are allowed in DynamoDB table names. The underlying Scylla database has no problems with dots or minuses in table names, and neither do "nodetool" tools which take a keyspace name and table name as separate parameters.

Does scylla-manager use CQL to create or read those keyspaces, or looks directly at the filesystem? If the latter, why does it matter which names CQL allows?

karol-kokoszka commented 1 year ago

I was too fast with closing this issue...

My plan was to create issue that requests to add integration-tests validating backups for alternator. We use CQL session in tests now to create the keyspace and populate the data before issuing the backup.

I will create the relevant task shortly, so that we use alernator API to create the test data.

CQL session is needed for backups if end user defines username and password when adding the cluster to the manager. When credentials are provided, manager dumps the schema in CQL and keeps it as a part of backup.

mykaul commented 10 months ago

@karol-kokoszka - any updates?

karol-kokoszka commented 10 months ago

We do not officially support alternator backups and we don't have any test cases validating it.

https://docs.google.com/presentation/d/10gkVgSBUZidQmCmf3crUFivr3GHiZ_v7YnqkxgZw7CM/edit#slide=id.g823c928c49_0_0 Above is the link to one of the webinars about Alternator in K8s. Backup/restore is marked as not supported.

@mykaul Let me forward you one of the email threads about backup/restore tests for Alternator.

We haven't started implementing any integration test covering alternator backup/restore neither. We can raise it on planning.

mykaul commented 10 months ago

@karol-kokoszka - it'd be great to get a rough estimate of the complexity to implement it - we (already) have 2 customer issues around it. @tzach - please prioritize / de-prioritize it.

tzach commented 10 months ago

@karol-kokoszka the assumption was Alternator backup works. The K8s use case is not relevant. In any case, this look both simple to fix and urgent. Lets release a patch release for this ASAP.

tomer-sandler commented 10 months ago

We do not officially support alternator backups and we don't have any test cases validating it.

https://docs.google.com/presentation/d/10gkVgSBUZidQmCmf3crUFivr3GHiZ_v7YnqkxgZw7CM/edit#slide=id.g823c928c49_0_0 Above is the link to one of the webinars about Alternator in K8s. Backup/restore is marked as not supported.

@mykaul Let me forward you one of the email threads about backup/restore tests for Alternator.

We haven't started implementing any integration test covering alternator backup/restore neither. We can raise it on planning.

We have a GA product with production customers and also OSS users using it in production, and they rely on Scylla Manager to do what Scylla advertise it does (handle backup + repair). And we don't even have a disclaimer or documented limitation that SM does not support Alternator backups ? 😞

tomer-sandler commented 10 months ago

@karol-kokoszka the assumption was Alternator backup works. The K8s use case is not relevant. In any case, this look both simple to fix and urgent. Lets release a patch release for this ASAP.

Assumption without testing? What happened to the "Not tested == Not working/Supported" guideline? From Support perspective we request not only a "fix" for the issues 2 customers are facing, but also a complete testing suite to ensure there are no other gaps here. Thanks!

hopugop commented 10 months ago

According to DynamoDB's Naming Convention documentation, the following are allowed on table names (as well as attributes): Table names and index names must be between 3 and 255 characters long, and can contain only the following characters:

As far as I can tell from the regex on this part of the code we do not seek to include the dot . as well.

karol-kokoszka commented 10 months ago

@Michal-Leszczynski please check @hopugop suggestion.