thelastpickle / cassandra-medusa

Apache Cassandra Backup and Restore Tool
Apache License 2.0
260 stars 141 forks source link

backup-cluster option fails on ssh connection error #253

Open sigalits opened 3 years ago

sigalits commented 3 years ago

Project board link

Trying to run the new backup-cluster option, and i get failures on the ssh key : [cassandra-dba-dev]/home/cassandra >medusa backup-cluster --backup-name test1 INFO: Monitoring provider is noop INFO: Starting backup test1 WARNING: is ccm : 0 INFO: Creating snapshots on all nodes INFO: Executing "nodetool snapshot -t medusa-test1" on following nodes ['ip-10-XXX-XX-130.ec2.internal', 'ip-10-XXX-39-XXX.ec2.internal', 'ip-10-XXX-XX-202.ec2.internal', 'ip-10-XXX-XX-54.ec2.internal', 'ip-10--XXX-XX-11.ec2.internal', 'ip-10-XXX-XX-123.ec2.internal'] with a parallelism/pool size of 500 [2021-01-12 10:12:45,824] ERROR: Job executing "nodetool snapshot -t medusa-test1" ran and finished with errors on following nodes: ['ip-10--XXX-XX-130.ec2.internal', 'ip-10--XXX-XX-226.ec2.internal', 'ip-10--XXX-XX-202.ec2.internal', 'ip-10--XXX-XX-54.ec2.internal', 'ip-10-XXX-XX-11.ec2.internal', 'ip-10-162-43-123.ec2.internal'] [2021-01-12 10:12:45,825] INFO: [ip-10-XXX-XX-130.ec2.internal] /bin/bash: nodetool: command not found [2021-01-12 10:12:45,825] INFO: ip-10-XXX-XX-130.ec2.internal-stdout: /bin/bash: nodetool: command not found [2021-01-12 10:12:45,825] INFO: [ip-10-XXX-XX-226.ec2.internal] /bin/bash: nodetool: command not found [2021-01-12 10:12:45,825] INFO: ip-10-XXX-XX-226.ec2.internal-stdout: /bin/bash: nodetool: command not found [2021-01-12 10:12:45,825] INFO: [ip-10-XXX-XX-202.ec2.internal] /bin/bash: nodetool: command not found [2021-01-12 10:12:45,825] INFO: ip-10-XXX-XX-202.ec2.internal-stdout: /bin/bash: nodetool: command not found [2021-01-12 10:12:45,826] INFO: [ip-10-XXX-XX-54.ec2.internal] /bin/bash: nodetool: command not found [2021-01-12 10:12:45,826] INFO: ip-10-XXX-XX-54.ec2.internal-stdout: /bin/bash: nodetool: command not found [2021-01-12 10:12:45,826] INFO: [ip-10-XXX-XX-11.ec2.internal] /bin/bash: nodetool: command not found [2021-01-12 10:12:45,826] INFO: ip-10-XXX-XX-11.ec2.internal-stdout: /bin/bash: nodetool: command not found [2021-01-12 10:12:45,826] INFO: [ip-10-XXX-XX-123.ec2.internal] /bin/bash: nodetool: command not found [2021-01-12 10:12:45,826] INFO: ip-10-XXX-XX-123.ec2.internal-stdout: /bin/bash: nodetool: command not found

Tried also specifying the username and key in the command , but got the same errors.

connecting using simple ssh to each of the servers resolves that : ssh ip-10-XXX-XX-11.ec2.internal The authenticity of host 'ip-10--XXX-XX-11.ec2.internal (10.XXX.XX.11)' can't be established. ECDSA key fingerprint is SHA256:Z+MmNEkzuWkcUihkWKt/aY4iNje7sywPTzEjcum3g/A. ECDSA key fingerprint is MD5:37:43:f1:f6:28:7a:7d:c3:85:62:60:70:eb:d3:b8:cb. Are you sure you want to continue connecting (yes/no)? yes

meaning that the key is ok ,

so maybe the ssh command used by medusa is missing those options : ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no which does allows me to connect

Thanks Sigalit

┆Issue is synchronized with this Jira Story by Unito

sigalits commented 3 years ago

i have added to the .ssh/config on each nodes those options : "Host * StrictHostKeyChecking no UserKnownHostsFile /dev/null"

Sadly it still failes on the same erros.

while this ssh command works fine :

ssh ip-10-XXX-XX-11.ec2.internal -C 'nodetool status' Warning: Permanently added 'ip-10-XXX-XX-11.ec2.internal,10.XXX.XX.11' (ECDSA) to the list of known hosts. Datacenter: us-east

Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.XXX.XX.226 9.89 MiB 256 47.7% 359ca6a6-b945-4435-a304-8b2b9b2f3815 1a UN 10.XXX.XX.130 9.5 MiB 256 52.3% e04458d5-42e7-45a9-bb96-a6a7be3791ad 1a UN 10.XXX.XX.54 11.14 MiB 256 53.1% a765de06-2132-4bcf-b6f6-ae8331b13f55 1b UN 10.XXX.XX.202 9.71 MiB 256 46.9% df6d8ada-7620-4d5a-8590-056734945e15 1b UN 10.XXX.XX.123 9.67 MiB 256 46.0% f3830c26-f5c1-4015-aafa-fa878b9348c4 1c UN 10.XXX.XX.11 10.5 MiB 256 54.0% 45e6447c-87db-4d4b-97ae-e1fcfb2c2b0e 1c

LO764640 commented 3 years ago

Can anyone please update if this issue is fixes getting the issue while running the backup of cluster, the code works fine during single node backup

sandeepmallik commented 2 years ago

@LO764640 @sigalits @adejanovski I used cassandra tarball setup and found medusa having issues to pick paths properly while using backup-cluster option. I followed below procedure to make medusa work. Also make sure /etc/hosts and dns is set properly when resolve_ip_addresses is used in medusa.ini. Hope it helps.

1) Set medusa/medusa-wrapper/nodetool/cqlsh paths.

$ sudo visudo Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/opt/cassandra/bin

2) Set cassandra conf, cassandra libraries.

$ vi /etc/environment

CASSANDRA_CONF=/opt/cassandra/conf/ CLASSPATH=/opt/cassandra/lib/

ajit-devops-2008 commented 2 years ago

with the help @sandeepmallik solution we solved /bin/bash: nodetool: command not found issue.

Now we are geting error during upload. as shown below.

[2022-05-16 13:34:34,494] INFO: Monitoring provider is noop [2022-05-16 13:34:35,450] INFO: No backups found in index. Consider running "medusa build-index" if you have some backups [2022-05-16 13:34:35,450] INFO: Starting backup stage-medusa-backup-16May2022 [2022-05-16 13:34:35,458] WARNING: is ccm : 0 [2022-05-16 13:34:35,671] INFO: Creating snapshots on all nodes [2022-05-16 13:34:38,699] INFO: A snapshot medusa-stage-medusa-backup-16May2022 was created on all nodes. [2022-05-16 13:34:38,699] INFO: Uploading snapshots from nodes to external storage [2022-05-16 13:34:38,700] INFO: Executing "mkdir -p /tmp/medusa-job-e2b82637-c379-47ab-88fb-2ccca7373b86; cd /tmp/medusa-job-e2b82637-c379-47ab-88fb-2ccca7373b86 && medusa-wrapper sudo medusa -vvv backup-node --backup-name stage-medusa-backup-16May2022 --mode differential" on following nodes ['cassandra.iviws.local', 'cas03.iviws.local', 'cas04.iviws.local', 'cas02.iviws.local'] with a parallelism/pool size of 1 [2022-05-16 13:34:41,143] ERROR: Job executing "mkdir -p /tmp/medusa-job-e2b82637-c379-47ab-88fb-2ccca7373b86; cd /tmp/medusa-job-e2b82637-c379-47ab-88fb-2ccca7373b86 && medusa-wrapper sudo medusa -vvv backup-node --backup-name stage-medusa-backup-16May2022 --mode differential" ran and finished with errors on following nodes: ['cas02.iviws.local', 'cas03.iviws.local', 'cas04.iviws.local', 'cassandra.iviws.local'] [2022-05-16 13:34:41,145] ERROR: Some nodes failed to upload the backup. [2022-05-16 13:34:41,145] ERROR: This error happened during the cluster backup: Some nodes failed to upload the backup.

the stderris showing as below: $cat /tmp/medusa-job-e2b82637-c379-47ab-88fb-2ccca7373b86/stderr Traceback (most recent call last): File "/home/ec2-user/.local/bin/medusa", line 5, in from medusa.medusacli import cli ModuleNotFoundError: No module named 'medusa'

rzvoncek commented 5 months ago

Hi @ajit-devops-2008 , I know it's been too long, but I'm grooming tickets now and stumbled upon this. Did you manage to solve the issues? Is there anything we can help with?