thelastpickle / cassandra-medusa

Apache Cassandra Backup and Restore Tool
Apache License 2.0
264 stars 142 forks source link

backup-cluster problem with 0.10.1 installed via pip3 #375

Open bkrajmalnik1 opened 3 years ago

bkrajmalnik1 commented 3 years ago

Project board link

There appears to be an issue with the backup-cluster command. I just installed 0.10.1 via pip3 on a 3 node cluster running Cassandra 3.11.9 on Centos 7.9. There appears to be an issue with backup-cluster, whereby instead of each node running a nodetool snapshot on its own node, each node is attempting to execute a remote snapshot on the node from which the backup-cluster command was executed.

In my case, the node on which medusa backup-cluster was executed is 10.254.254.81. This is the log snippet from the failed operation:

[2021-06-21 09:39:11,063] INFO: Executing "nodetool -h 10.254.254.81 snapshot -t medusa-test1" on following nodes ['cass01.testdomain.local', 'cass02.testdomain.local', 'cass03.testdomain.local'] with a parallelism/pool size of 500
[2021-06-21 09:39:11,063] DEBUG: Batch #1: Running "nodetool -h 10.254.254.81 snapshot -t medusa-test1" on nodes ['cass01.testdomain.local', 'cass02.testdomain.local', 'cass03.testdomain.local'] parallelism of 3
[2021-06-21 09:39:11,064] DEBUG: Connecting to cass01.testdomain.local..
[2021-06-21 09:39:11,066] DEBUG: Connecting to cass02.testdomain.local..
[2021-06-21 09:39:11,067] DEBUG: Connecting to cass03.testdomain.local..
[2021-06-21 09:39:11,627] DEBUG: Running parsed command sudo -S $SHELL -c "nodetool -h 10.254.254.81 snapshot -t medusa-test1" on cass02.testdomain.local
[2021-06-21 09:39:11,635] DEBUG: Command started
[2021-06-21 09:39:11,664] DEBUG: Running parsed command sudo -S $SHELL -c "nodetool -h 10.254.254.81 snapshot -t medusa-test1" on cass03.testdomain.local
[2021-06-21 09:39:11,671] DEBUG: Command started
[2021-06-21 09:39:11,698] DEBUG: Running parsed command sudo -S $SHELL -c "nodetool -h 10.254.254.81 snapshot -t medusa-test1" on cass01.testdomain.local
[2021-06-21 09:39:11,704] DEBUG: Command started
[2021-06-21 09:39:14,259] ERROR: Job executing "nodetool -h 10.254.254.81 snapshot -t medusa-test1" ran and finished with errors on following nodes: ['cass01.testdomain.local', 'cass02.testdomain.local']

medusa.log

In the above run, nodetool_host was set in each of the nodes to point to its own IP address. Commenting out the nodetool_host parameter allowed the snapshots to take place.

Therefore, I don't think this is so much a functional issue, and rather one of documentation.

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: MED-57

jsanda commented 2 years ago

Please add your planning poker estimate with ZenHub @adejanovski

rzvoncek commented 7 months ago

I was able to reproduce this behaviour, and it does seem to be an unexpected behaviour, if not a downright bug.

The problem is that if we have medusa.ini/cassandra/nodetool_host configured on the node where we run the backup-cluster command from, it'll then use this host on any other node it sshs into to make snapshots and run backups.

The correct behaviour would be to check the medusa.ini on the node it sshd into and use the host form there.