Backup-cluster fails on all authentication - or on second attempt #470

tlb1galaxy commented 2 years ago

Hello, I am trying to backup a new Cassandra cluster (4 x node of CentOS7) using local storage (NFS mounts shared by all nodes) and all forms of authentication seems to fail.

Have SSH-auth configured between all the nodes. Have enabled and populated ssh-agent (even-though I cannot find any documentation referencing this as a requirement)

ENVIRONMENT: Cassandra version:

[root@cassandranode03 ~]# nodetool version
ReleaseVersion: 3.11.12

Cassandra status:

[root@cassandranode03 ~]# nodetool status
Datacenter: tlb1
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID             Rack
UN  245.13 KiB  256          49.0%             1922256d-REMOVED  compass_cassandra01_rack01
UN  425.6 KiB  256          46.2%             fc247005-REMOVED  compass_cassandra01_rack01
UN  231.49 KiB  256          48.7%             2b094909-REMOVED  compass_cassandra01_rack01
UN  275.85 KiB  256          56.0%             9b858d8c-REMOVED  compass_cassandra01_rack01


[root@cassandranode03 ~]# python --version
Python 2.7.5
[root@cassandranode03 ~]# python3 --version
Python 3.6.8
[root@cassandranode03 ~]# which python
[root@cassandranode03 ~]# which python3


[root@cassandranode03 ~]# medusa --version

PIP packages:

[root@cassandranode03 ~]# pip3 list installed
Package                Version
---------------------- -----------
apache-libcloud        3.3.1
cassandra-driver       3.25.0
cassandra-medusa       0.12.2
cassandra-pylib        0.0.0
certifi                2021.10.8
cffi                   1.15.0
chardet                3.0.4
click                  8.0.4
click-aliases          1.0.1
cryptography           3.3.2
fasteners              0.16
ffwd                   0.0.2
geomet                 0.2.1.post1
gevent                 21.12.0
greenlet               1.1.2
grpcio                 1.44.0
grpcio-health-checking 1.44.0
grpcio-tools           1.44.0
idna                   2.8
importlib-metadata     4.8.3
lockfile               0.12.2
parallel-ssh           2.2.0
pip                    21.3.1
protobuf               3.19.4
psutil                 5.9.0
pycparser              2.21
pycryptodome           3.14.1
python-dateutil        2.8.0
PyYAML                 6.0
requests               2.22.0
retrying               1.3.3
setuptools             39.2.0
six                    1.16.0
ssh-python             0.10.0
ssh2-python            0.22.0
typing_extensions      4.1.1
urllib3                1.25.11
zipp                   3.6.0
zope.event             4.5.0
zope.interface         5.4.0


[root@cassandranode03 ~]# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)


[root@cassandranode03 ~]# df -hT
Filesystem                                             Type      Size  Used Avail Use% Mounted on
devtmpfs                                               devtmpfs  7.9G     0  7.9G   0% /dev
tmpfs                                                  tmpfs     7.9G     0  7.9G   0% /dev/shm
tmpfs                                                  tmpfs     7.9G  8.9M  7.9G   1% /run
tmpfs                                                  tmpfs     7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/mapper/vg01-root                                  xfs       8.4G  1.9G  6.5G  23% /
/dev/sda1                                              xfs       473M  160M  313M  34% /boot
/dev/mapper/vg04_lvm_casslogs01-lvm_casslogs01         xfs        20G   36M   20G   1% /storage/lvm_casslogs01
/dev/mapper/vg01-var                                   xfs       9.4G  553M  8.8G   6% /var
/dev/mapper/vg03_lvm_cassdata01-lvm_cassdata01         xfs        10G   37M   10G   1% /storage/lvm_cassdata01 nfs4      280G   33M  280G   1% /exports/compassfile01/lvm_backup01/backups/compasscass

SSH auth:

[root@cassandranode03 ~]# ssh
Last login: Wed Apr 27 13:17:00 2022 from
[root@cassandranode01 ~]# exit
Connection to closed.
[root@cassandranode03 ~]# ssh
Last login: Wed Apr 27 14:06:33 2022 from
[root@cassandranode02 ~]# exit
Connection to closed.
[root@cassandranode03 ~]# ssh
Last login: Wed Apr 27 13:57:35 2022 from
[root@cassandranode04 ~]# exit
Connection to closed.


[root@cassandranode03 ~]# ps -aux | grep ssh-agent
root      1780  0.0  0.0  72552  1228 ?        Ss   14:15   0:00 ssh-agent
root      2243  0.0  0.0 112808   976 pts/0    S+   15:11   0:00 grep --color=auto ssh-agent
[root@cassandranode03 ~]# ssh-add -l
4096 SHA256:{{REMOVED}} /root/.ssh/id_rsa (RSA)


Medusa command:

Medusa command output on source
[root@cassandranode03 ~]# medusa -vv backup-cluster --backup-name=manual1321 --mode=full
[2022-04-27 15:13:12,164] DEBUG: Loading configuration from /etc/medusa/medusa.ini
[2022-04-27 15:13:12,168] DEBUG: Resolved to
[2022-04-27 15:13:12,168] DEBUG: Logging to file options: LoggingConfig(enabled='1', file='medusa.log', format='[%(asctime)s] %(levelname)s: %(message)s', level='INFO', maxBytes='20000000', backupCount='30')
[2022-04-27 15:13:12,170] INFO: Monitoring provider is noop
[2022-04-27 15:13:12,170] DEBUG: Loading storage_provider: local
[2022-04-27 15:13:12,173] INFO: No backups found in index. Consider running "medusa build-index" if you have some backups
[2022-04-27 15:13:12,173] INFO: Starting backup manual1321
[2022-04-27 15:13:12,180] DEBUG: This server has systemd: True
[2022-04-27 15:13:12,490] DEBUG: Connecting to cluster, contact points: ['']; protocol version: 66
[2022-04-27 15:13:12,491] DEBUG: Host is now marked up
[2022-04-27 15:13:12,491] DEBUG: [control connection] Opening new connection to
[2022-04-27 15:13:12,493] DEBUG: Sending initial options message for new connection (140075216313648) to
[2022-04-27 15:13:12,494] DEBUG: Defuncting connection (140075216313648) to <Error from server: code=000a [Protocol error] message="Invalid or unsupported protocol version (66); supported versions are (3/v3, 4/v4, 5/v5-beta)">
[2022-04-27 15:13:12,494] DEBUG: Closing connection (140075216313648) to
[2022-04-27 15:13:12,494] DEBUG: Closed socket to
[2022-04-27 15:13:12,494] DEBUG: Exception in read for <GeventConnection(140075216313648) (defunct)>: [Errno 9] Bad file descriptor
[2022-04-27 15:13:12,495] WARNING: Downgrading core protocol version from 66 to 65 for To avoid this, it is best practice to explicitly set Cluster(protocol_version) to the version supported by your cluster.
[2022-04-27 15:13:12,495] DEBUG: Sending initial options message for new connection (140075216313088) to
[2022-04-27 15:13:12,496] DEBUG: Defuncting connection (140075216313088) to <Error from server: code=000a [Protocol error] message="Invalid or unsupported protocol version (65); supported versions are (3/v3, 4/v4, 5/v5-beta)">
[2022-04-27 15:13:12,496] DEBUG: Closing connection (140075216313088) to
[2022-04-27 15:13:12,496] DEBUG: Closed socket to
[2022-04-27 15:13:12,496] DEBUG: Exception in read for <GeventConnection(140075216313088) (defunct)>: [Errno 9] Bad file descriptor
[2022-04-27 15:13:12,496] WARNING: Downgrading core protocol version from 65 to 5 for To avoid this, it is best practice to explicitly set Cluster(protocol_version) to the version supported by your cluster.
[2022-04-27 15:13:12,497] DEBUG: Sending initial options message for new connection (140075216313368) to
[2022-04-27 15:13:12,498] ERROR: Closing connection <GeventConnection(140075216313368)> due to protocol error: Error from server: code=000a [Protocol error] message="Beta version of the protocol used (5/v5-beta), but USE_BETA flag is unset"
[2022-04-27 15:13:12,498] DEBUG: Defuncting connection (140075216313368) to <Error from server: code=000a [Protocol error] message="Beta version of the protocol used (5/v5-beta), but USE_BETA flag is unset">
[2022-04-27 15:13:12,499] DEBUG: Closing connection (140075216313368) to
[2022-04-27 15:13:12,499] DEBUG: Closed socket to
[2022-04-27 15:13:12,499] DEBUG: Exception in read for <GeventConnection(140075216313368) (defunct)>: [Errno 9] Bad file descriptor
[2022-04-27 15:13:12,499] WARNING: Downgrading core protocol version from 5 to 4 for To avoid this, it is best practice to explicitly set Cluster(protocol_version) to the version supported by your cluster.
[2022-04-27 15:13:12,500] DEBUG: Sending initial options message for new connection (140075216313984) to
[2022-04-27 15:13:12,503] DEBUG: Received options response on new connection (140075216313984) from
[2022-04-27 15:13:12,504] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,504] DEBUG: Sending StartupMessage on <GeventConnection(140075216313984)>
[2022-04-27 15:13:12,504] DEBUG: Sent StartupMessage on <GeventConnection(140075216313984)>
[2022-04-27 15:13:12,505] DEBUG: Got AuthenticateMessage on new connection (140075216313984) from org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,505] DEBUG: Sending SASL-based auth response on <GeventConnection(140075216313984)>
[2022-04-27 15:13:12,602] DEBUG: Connection <GeventConnection(140075216313984)> successfully authenticated
[2022-04-27 15:13:12,603] DEBUG: [control connection] Established new connection <GeventConnection(140075216313984)>, registering watchers and refreshing schema and topology
[2022-04-27 15:13:12,612] DEBUG: [control connection] Refreshing node list and token map using preloaded results
[2022-04-27 15:13:12,613] INFO: Using datacenter 'tlb1' for DCAwareRoundRobinPolicy (via host ''); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes
[2022-04-27 15:13:12,613] DEBUG: [control connection] Found new host to connect to:
[2022-04-27 15:13:12,613] INFO: New Cassandra host <Host: tlb1> discovered
[2022-04-27 15:13:12,613] DEBUG: Handling new host <Host: tlb1> and notifying listeners
[2022-04-27 15:13:12,614] DEBUG: Done preparing queries for new host <Host: tlb1>
[2022-04-27 15:13:12,614] DEBUG: Host is now marked up
[2022-04-27 15:13:12,614] DEBUG: [control connection] Found new host to connect to:
[2022-04-27 15:13:12,614] INFO: New Cassandra host <Host: tlb1> discovered
[2022-04-27 15:13:12,614] DEBUG: Handling new host <Host: tlb1> and notifying listeners
[2022-04-27 15:13:12,614] DEBUG: Done preparing queries for new host <Host: tlb1>
[2022-04-27 15:13:12,615] DEBUG: Host is now marked up
[2022-04-27 15:13:12,615] DEBUG: [control connection] Found new host to connect to:
[2022-04-27 15:13:12,615] INFO: New Cassandra host <Host: tlb1> discovered
[2022-04-27 15:13:12,615] DEBUG: Handling new host <Host: tlb1> and notifying listeners
[2022-04-27 15:13:12,615] DEBUG: Done preparing queries for new host <Host: tlb1>
[2022-04-27 15:13:12,615] DEBUG: Host is now marked up
[2022-04-27 15:13:12,615] DEBUG: [control connection] Finished fetching ring info
[2022-04-27 15:13:12,615] DEBUG: [control connection] Rebuilding token map due to topology changes
[2022-04-27 15:13:12,636] DEBUG: Control connection created
[2022-04-27 15:13:12,637] DEBUG: Initializing connection for host
[2022-04-27 15:13:12,637] DEBUG: Initializing connection for host
[2022-04-27 15:13:12,638] DEBUG: Sending initial options message for new connection (140075216325152) to
[2022-04-27 15:13:12,640] DEBUG: Received options response on new connection (140075216325152) from
[2022-04-27 15:13:12,640] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,640] DEBUG: Sending StartupMessage on <GeventConnection(140075216325152)>
[2022-04-27 15:13:12,640] DEBUG: Sent StartupMessage on <GeventConnection(140075216325152)>
[2022-04-27 15:13:12,642] DEBUG: Got AuthenticateMessage on new connection (140075216325152) from org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,642] DEBUG: Sending SASL-based auth response on <GeventConnection(140075216325152)>
[2022-04-27 15:13:12,645] DEBUG: Sending initial options message for new connection (140075215933224) to
[2022-04-27 15:13:12,646] DEBUG: Received options response on new connection (140075215933224) from
[2022-04-27 15:13:12,646] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,646] DEBUG: Sending StartupMessage on <GeventConnection(140075215933224)>
[2022-04-27 15:13:12,646] DEBUG: Sent StartupMessage on <GeventConnection(140075215933224)>
[2022-04-27 15:13:12,647] DEBUG: Got AuthenticateMessage on new connection (140075215933224) from org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,647] DEBUG: Sending SASL-based auth response on <GeventConnection(140075215933224)>
[2022-04-27 15:13:12,739] DEBUG: Connection <GeventConnection(140075216325152)> successfully authenticated
[2022-04-27 15:13:12,739] DEBUG: Finished initializing connection for host
[2022-04-27 15:13:12,739] DEBUG: Added pool for host to session
[2022-04-27 15:13:12,740] DEBUG: Initializing connection for host
[2022-04-27 15:13:12,741] DEBUG: Sending initial options message for new connection (140075215931600) to
[2022-04-27 15:13:12,742] DEBUG: Received options response on new connection (140075215931600) from
[2022-04-27 15:13:12,742] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,742] DEBUG: Sending StartupMessage on <GeventConnection(140075215931600)>
[2022-04-27 15:13:12,742] DEBUG: Sent StartupMessage on <GeventConnection(140075215931600)>
[2022-04-27 15:13:12,743] DEBUG: Got AuthenticateMessage on new connection (140075215931600) from org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,743] DEBUG: Sending SASL-based auth response on <GeventConnection(140075215931600)>
[2022-04-27 15:13:12,747] DEBUG: Connection <GeventConnection(140075215933224)> successfully authenticated
[2022-04-27 15:13:12,747] DEBUG: Finished initializing connection for host
[2022-04-27 15:13:12,747] DEBUG: Added pool for host to session
[2022-04-27 15:13:12,747] DEBUG: Initializing connection for host
[2022-04-27 15:13:12,748] DEBUG: Sending initial options message for new connection (140075215920936) to
[2022-04-27 15:13:12,750] DEBUG: Received options response on new connection (140075215920936) from
[2022-04-27 15:13:12,750] DEBUG: No available compression types supported on both ends. locally supported: odict_keys([]). remotely supported: ['snappy', 'lz4']
[2022-04-27 15:13:12,750] DEBUG: Sending StartupMessage on <GeventConnection(140075215920936)>
[2022-04-27 15:13:12,750] DEBUG: Sent StartupMessage on <GeventConnection(140075215920936)>
[2022-04-27 15:13:12,751] DEBUG: Got AuthenticateMessage on new connection (140075215920936) from org.apache.cassandra.auth.PasswordAuthenticator
[2022-04-27 15:13:12,751] DEBUG: Sending SASL-based auth response on <GeventConnection(140075215920936)>
[2022-04-27 15:13:12,842] DEBUG: Connection <GeventConnection(140075215931600)> successfully authenticated
[2022-04-27 15:13:12,842] DEBUG: Finished initializing connection for host
[2022-04-27 15:13:12,843] DEBUG: Added pool for host to session
[2022-04-27 15:13:12,847] DEBUG: Connection <GeventConnection(140075215920936)> successfully authenticated
[2022-04-27 15:13:12,848] DEBUG: Finished initializing connection for host
[2022-04-27 15:13:12,848] DEBUG: Added pool for host to session
[2022-04-27 15:13:12,848] DEBUG: Not starting MonitorReporter thread for Insights; not supported by server version 3.11.12 on ControlConnection host
[2022-04-27 15:13:12,848] DEBUG: Started Session with client_id bc5b9d88-c244-417e-8cef-c3c36a3fc7a4 and session_id 85a57b50-bdce-4347-a86e-6f394730fae9
[2022-04-27 15:13:12,848] DEBUG: Checking placement using dc and rack...
[2022-04-27 15:13:12,849] DEBUG: Resolved to
[2022-04-27 15:13:12,850] DEBUG: Checking host against
[2022-04-27 15:13:12,851] DEBUG: Resolved to
[2022-04-27 15:13:12,852] DEBUG: Resolved to
[2022-04-27 15:13:12,852] DEBUG: Resolved to
[2022-04-27 15:13:12,853] DEBUG: Resolved to
[2022-04-27 15:13:12,853] DEBUG: Closing connection (140075216325152) to
[2022-04-27 15:13:12,853] DEBUG: Closed socket to
[2022-04-27 15:13:12,853] DEBUG: Closing connection (140075215933224) to
[2022-04-27 15:13:12,853] DEBUG: Closed socket to
[2022-04-27 15:13:12,854] DEBUG: Closing connection (140075215931600) to
[2022-04-27 15:13:12,854] DEBUG: Closed socket to
[2022-04-27 15:13:12,854] DEBUG: Closing connection (140075215920936) to
[2022-04-27 15:13:12,854] DEBUG: Closed socket to
[2022-04-27 15:13:12,854] DEBUG: Shutting down Cluster Scheduler
[2022-04-27 15:13:12,854] DEBUG: Shutting down control connection
[2022-04-27 15:13:12,854] DEBUG: Closing connection (140075216313984) to
[2022-04-27 15:13:12,855] DEBUG: Closed socket to
[2022-04-27 15:13:12,855] INFO: Creating snapshots on all nodes
[2022-04-27 15:13:12,855] INFO: Executing "nodetool snapshot -t medusa-manual1321" on following nodes ['', '', '', ''] with a parallelism/pool size of 500
[2022-04-27 15:13:12,855] DEBUG: Batch #1: Running "nodetool snapshot -t medusa-manual1321" on nodes ['', '', '', ''] parallelism of 4
[2022-04-27 15:13:12,856] DEBUG: _run_command with read timeout None
[2022-04-27 15:13:12,856] DEBUG: Make client request for host, (host_i, host) in clients: False
[2022-04-27 15:13:12,856] DEBUG: Connecting to
[2022-04-27 15:13:12,856] DEBUG: _run_command with read timeout None
[2022-04-27 15:13:12,857] DEBUG: Make client request for host, (host_i, host) in clients: False
[2022-04-27 15:13:12,857] DEBUG: Connecting to
[2022-04-27 15:13:12,857] DEBUG: _run_command with read timeout None
[2022-04-27 15:13:12,857] DEBUG: Make client request for host, (host_i, host) in clients: False
[2022-04-27 15:13:12,857] DEBUG: Connecting to
[2022-04-27 15:13:12,858] DEBUG: _run_command with read timeout None
[2022-04-27 15:13:12,858] DEBUG: Make client request for host, (host_i, host) in clients: False
[2022-04-27 15:13:12,858] DEBUG: Connecting to
[2022-04-27 15:13:12,859] DEBUG: Starting new session for
[2022-04-27 15:13:12,859] DEBUG: Session started, connecting with existing socket
[2022-04-27 15:13:12,927] DEBUG: Agent auth failed with b"Access denied for 'publickey'. Authentication that can continue: publickey,gssapi-keyex,gssapi-with-mic,password", continuing with other authentication methods
[2022-04-27 15:13:12,928] DEBUG: Trying to authenticate with identity file /root/.ssh/id_rsa
[2022-04-27 15:13:12,941] DEBUG: Authentication with identity file /root/.ssh/id_rsa failed, continuing with other identities
[2022-04-27 15:13:12,941] DEBUG: Starting new session for
[2022-04-27 15:13:12,941] DEBUG: Session started, connecting with existing socket
[2022-04-27 15:13:13,015] DEBUG: Authentication with SSH Agent succeeded.
[2022-04-27 15:13:13,015] DEBUG: Authentication completed successfully - setting session to non-blocking mode
[2022-04-27 15:13:13,015] DEBUG: Opening new channel on
[2022-04-27 15:13:13,015] DEBUG: Channel open session blocked, waiting on socket..
[2022-04-27 15:13:13,015] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,016] DEBUG: Starting new session for
[2022-04-27 15:13:13,016] DEBUG: Session started, connecting with existing socket
[2022-04-27 15:13:13,091] DEBUG: Authentication with SSH Agent succeeded.
[2022-04-27 15:13:13,091] DEBUG: Authentication completed successfully - setting session to non-blocking mode
[2022-04-27 15:13:13,091] DEBUG: Opening new channel on
[2022-04-27 15:13:13,091] DEBUG: Channel open session blocked, waiting on socket..
[2022-04-27 15:13:13,091] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,091] DEBUG: Starting new session for
[2022-04-27 15:13:13,092] DEBUG: Session started, connecting with existing socket
[2022-04-27 15:13:13,165] DEBUG: Authentication with SSH Agent succeeded.
[2022-04-27 15:13:13,166] DEBUG: Authentication completed successfully - setting session to non-blocking mode
[2022-04-27 15:13:13,166] DEBUG: Opening new channel on
[2022-04-27 15:13:13,166] DEBUG: Channel open session blocked, waiting on socket..
[2022-04-27 15:13:13,166] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:13,267] DEBUG: Polling socket with timeout 100
[2022-04-27 15:13:14,591] DEBUG: Channel is at EOF trying to read stderr - reader exiting
Target node - /var/log/secure:

### target Cassandra node /etc/var/secure
Apr 27 15:13:13 cassandranode01 sshd[16492]: Accepted publickey for root from port 33340 ssh2: RSA SHA256:{{REMOVED}}
Apr 27 15:13:13 cassandranode01 sshd[16492]: pam_unix(sshd:session): session opened for user root by (uid=0)
Apr 27 15:13:13 cassandranode01 sudo:    root : TTY=unknown ; PWD=/root ; USER=root ; COMMAND=/bin/bash -c nodetool snapshot -t medusa-manual1321
Apr 27 15:13:13 cassandranode01 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 27 15:13:14 cassandranode01 sudo: pam_unix(sudo:session): session closed for user root
Apr 27 15:13:54 cassandranode01 sshd[16492]: pam_unix(sshd:session): session closed for user root

Source node - /var/log/secure:

### Source Cassandra node /etc/var/secure
Apr 27 15:13:22 cassandranode03 sshd[2260]: error: maximum authentication attempts exceeded for root from port 41896 ssh2 [preauth]
Apr 27 15:13:22 cassandranode03 sshd[2260]: Disconnecting: Too many authentication failures [preauth]


cat /etc/medusa/medusa.ini

;stop_cmd = /etc/init.d/cassandra stop
;start_cmd = /etc/init.d/cassandra start
config_file = /etc/cassandra/default.conf/cassandra.yaml
cql_username = cassandraadmin
cql_password = Th1nk0nLAB!
;nodetool_username =  <my nodetool username>
;nodetool_password =  <my nodetool password>
;nodetool_password_file_path = <path to nodetool password file>
;nodetool_host = <host name or IP to use for nodetool>
;nodetool_port = <port number to use for nodetool>
;certfile= <Client SSL: path to rootCa certificate>
;usercert= <Client SSL: path to user certificate>
;userkey= <Client SSL: path to user key>
;sstableloader_ts = <Client SSL: full path to truststore>
;sstableloader_tspw = <Client SSL: password of the truststore>
;sstableloader_ks = <Client SSL: full path to keystore>
;sstableloader_kspw = <Client SSL: password of the keystore>
;sstableloader_bin = <Location of the sstableloader binary if not in PATH>

; Enable this to add the '--ssl' parameter to nodetool. The is expected to be in the normal location
;nodetool_ssl = true

; Command ran to verify if Cassandra is running on a node. Defaults to "nodetool version"
check_running = nodetool version

; Disable/Enable ip address resolving.
; Disabling this can help when fqdn resolving gives different domain names for local and remote nodes
; which makes backup succeed but Medusa sees them as incomplete.
; Defaults to True.
resolve_ip_addresses = True

; When true, almost all commands executed by Medusa are prefixed with `sudo`.
; Does not affect the use_sudo_for_restore setting in the 'storage' section.
; See
; Defaults to True
;use_sudo = True

storage_provider = local
; storage_provider should be either of "local", "google_storage" or "s3"
region = <Region hosting the storage>

; Name of the bucket used for storing backups
bucket_name = cassandra_backups

; JSON key file for service account with access to GCS bucket or AWS credentials file (home-dir/.aws/credentials)
key_file = /etc/medusa/credentials

; Path of the local storage bucket (used only with 'local' storage provider)
base_path = /exports/compassfile01/lvm_backup01/backups/compasscass

; Any prefix used for multitenancy in the same bucket
prefix = tlb1.compass_cassandra01_rack01

;fqdn = <enforce the name of the local node. Computed automatically if not provided.>

; Number of days before backups are purged. 0 means backups don't get purged by age (default)
max_backup_age = 15
; Number of backups to retain. Older backups will get purged beyond that number. 0 means backups don't get purged by count (default)
max_backup_count = 0
; Both thresholds can be defined for backup purge.

; Used to throttle S3 backups/restores:
transfer_max_bandwidth = 50MB/s

; Max number of downloads/uploads. Not used by the GCS backend.
concurrent_transfers = 1

; Size over which S3 uploads will be using the awscli with multi part uploads. Defaults to 100MB.
multi_part_upload_threshold = 104857600

; GC grace period for backed up files. Prevents race conditions between purge and running backups
backup_grace_period_in_days = 10

; When not using sstableloader to restore data on a node, Medusa will copy snapshot files from a
; temporary location into the cassandra data directroy. Medusa will then attempt to change the
; ownership of the snapshot files so the cassandra user can access them.
; Depending on how users/file permissions are set up on the cassandra instance, the medusa user
; may need elevated permissions to manipulate the files in the cassandra data directory.
; This option does NOT replace the `use_sudo` option under the 'cassandra' section!
; See:
; Defaults to True
;use_sudo_for_restore = True

;api_profile = <AWS profile to use>

;host = <Optional object storage host to connect to>
;port = <Optional object storage port to connect to>

; Configures the use of SSL to connect to the object storage system.
;secure = True

;aws_cli_path = <Location of the aws cli binary if not in PATH>

;monitoring_provider = <Provider used for sending metrics. Currently either of "ffwd" or "local">

;username = <SSH username to use for restoring clusters>
;key_file = <SSH key for use for restoring clusters. Expected in PEM unencrypted format.>
;port = <SSH port for use for restoring clusters. Default to port 22.
;cert_file = <Path of public key signed certificate file to use for authentication. The corresponding private key must also be provided via key_file parameter>

;health_check = <Which ports to check when verifying a node restored properly. Options are 'cql' (default), 'thrift', 'all'.>
;query = <CQL query to run after a restore to verify it went OK>
;expected_rows = <Number of rows expected to be returned when the query runs. Not checked if not specified.>
;expected_result = <Coma separated string representation of values returned by the query. Checks only 1st row returned, and only if specified>
;enable_md5_checks = <During backups and verify, use md5 calculations to determine file integrity (in addition to size, which is used by default)>

; Controls file logging, disabled by default.
enabled = 1
file = medusa.log
level = INFO

; Control the log output format
format = [%(asctime)s] %(levelname)s: %(message)s

; Size over which log file will rotate
maxBytes = 20000000

; How many log files to keep
backupCount = 30

; Set to true when running in grpc server mode.
; Allows to propagate the exceptions instead of exiting the program.
;enabled = False

; The following settings are only intended to be configured if Medusa is running in containers, preferably in Kubernetes.
;enabled = False
;cassandra_url = <URL of the management API snapshot endpoint. For example:>

; Enables the use of the management API to create snapshots. Falls back to using Jolokia if not enabled.
;use_mgmt_api = True

tlb1galaxy commented 2 years ago

Possible Resolution: After spending a bunch of time on this, I finally have gotten the 'backup-cluster' function to work. Here are the conditions I had to implement to get this to work.

SSH-agent and forwarding:

SSH key-auth:

SUDOERS - secure_path: Reference an existing issue: - issue#253

Have to modify the line in /etc/sudoers via visudo add the 2 following paths to 'secure_paths'

Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/opt/cassandra/bin

**/etc/medusa/medusa.ini** - handle SSH keys:
The default example medusa.ini has 2 keys with the same name 'key_file'
  - [storage] - key_file
  - [ssh] - key_file
You need to ensure only one is active:
storage_provider = local
; storage_provider should be either of "local", "google_storage" or "s3"
; region = <Region hosting the storage>

; Name of the bucket used for storing backups
bucket_name = cassandra_backups

; JSON key file for service account with access to GCS bucket or AWS credentials file (home-dir/.aws/credentials)
; key_file = /etc/medusa/credentials

; Path of the local storage bucket (used only with 'local' storage provider)
base_path = /exports/compassfile01/lvm_backup01/backups/compasscass

; Any prefix used for multitenancy in the same bucket
prefix = tlb1.compass_cassandra01_rack01

;fqdn = <enforce the name of the local node. Computed automatically if not provided.>

; Number of days before backups are purged. 0 means backups don't get purged by age (default)
max_backup_age = 15
; Number of backups to retain. Older backups will get purged beyond that number. 0 means backups don't get purged by count (default)
max_backup_count = 0
; Both thresholds can be defined for backup purge.

; Used to throttle S3 backups/restores:
transfer_max_bandwidth = 50MB/s

; Max number of downloads/uploads. Not used by the GCS backend.
concurrent_transfers = 1

; Size over which S3 uploads will be using the awscli with multi part uploads. Defaults to 100MB.
multi_part_upload_threshold = 104857600

; GC grace period for backed up files. Prevents race conditions between purge and running backups
backup_grace_period_in_days = 10

username = root
key_file = /root/.ssh/id_rsa
;port = <SSH port for use for restoring clusters. Default to port 22.
;cert_file = <Path of public key signed certificate file to use for authentication. The corresponding private key must also be provided via key_file parameter>
rzvoncek commented 6 months ago

I was not able to reproduce this.

The [ssh] section needs the username/password. It might be nice to defualt to $USER and ~/.ssh/id_rsa, but that's perhaps for another issue.

The clash of cassandra/key_file and ssh/key_file does not seem to be a thing either.