thelastpickle / cassandra-medusa

Apache Cassandra Backup and Restore Tool
Apache License 2.0
266 stars 143 forks source link

restore-cluster restoring instances from the incorrect location #766

Open chrisjmiller1 opened 6 months ago

chrisjmiller1 commented 6 months ago

Project board link

Hi,

I'm currently testing Medusa with OCI (Oracle cloud infrastructure) and have noticed that when I run a restore-cluster command that the source path for node1 is being used for subsequent nodes i.e. node2/3.

Command: medusa restore-cluster --backup-name 20240521-cluster-1

The output from the initial command looks good: [2024-05-22 14:42:20,019] INFO: About to restore on node1 using {'source': ['node1'], 'seed': True} as backup source [2024-05-22 14:42:20,019] INFO: About to restore on node2 using {'source': ['node2'], 'seed': True} as backup source [2024-05-22 14:42:20,019] INFO: About to restore on node3 using {'source': ['node3'], 'seed': True} as backup source

But when I look in the medusa.logs for node2 and node3 I see the following which demonstrate the incorrect source path is being used. Node 2: [2024-05-22 14:46:40,709] DEBUG: aws --endpoint-url https://idxxxxxxxxxxx.compat.objectstorage.us-ashburn-1.oraclecloud.com:443 s3 cp s3://bucketname/node1/20240521-cluster-1/meta/tokenmap.json /tmp/medusa-restore-3ea01bff-216f-4536-8d6d-5809dae267de Node 3: [2024-05-22 14:49:47,420] DEBUG: https://idxxxxxxxxxxx.compat.objectstorage.us-ashburn-1.oraclecloud.com:443 "HEAD /bucketname/node1/20240521-cluster-1/meta/tokenmap.json HTTP/1.1" 200 0 [2024-05-22 14:49:47,421] DEBUG: aws --endpoint-url https://idxxxxxxxxxxx.compat.objectstorage.us-ashburn-1.oraclecloud.com:443 s3 cp s3://bucketname/node1/20240521-cluster-1/meta/tokenmap.json /tmp/medusa-restore-3f95200e-ed82-46f0-acfb-59c7d6d9a7cb

This then results in the following error being reported (as expected) on the other nodes: ERROR [main] 2024-05-22 12:59:50,843 CassandraDaemon.java:897 - Cannot start node if snitch's rack (1b) differs from previous rack (1a). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_rack=true.

I'm seeing this behavior on 0.15.0 and 0.21.0.

Thanks,

Chris.

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: MED-9

rzvoncek commented 6 months ago

Hello and thanks for the report. This indeed looks like a bug, something similar to what we've seen in https://github.com/thelastpickle/cassandra-medusa/issues/676.

I'll try to find some time to fix this, but I can't promise anything.

chrisjmiller1 commented 2 months ago

Hi @rzvoncek , @adejanovski , just checking in to see how this issue is progressing.

Thanks,

Chris.