Hello Team,

As part of POC, we are evaluating the Medusa Backup and Restore tool for the Apache Cassandra Cluster.

Currently, we are experiencing an issue with "medusa backup-cluster". We are getting the Permission denied error for /usr/local/bin/medusa-wrapper executable.

My environment:

/usr/local/bin/medusa --version

0.15.0

cassandra -v

3.11.13

python --version

Python 2.7.5

python3 --version

Python 3.6.8

java -version

openjdk version "1.8.0_382" OpenJDK Runtime Environment (build 1.8.0_382-b05) OpenJDK 64-Bit Server VM (build 25.382-b05, mixed mode)

`[2023-11-30 04:46:23,454] INFO: [10.66.231.208] [err] bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,454] INFO: 10.66.231.208-stderr: bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,454] INFO: [10.66.231.212] [err] bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,454] INFO: 10.66.231.212-stderr: bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,455] INFO: [10.66.231.248] [err] bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,455] INFO: 10.66.231.248-stderr: bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,455] ERROR: Some nodes failed to upload the backup. [2023-11-30 04:46:23,455] ERROR: This error happened during the cluster backup: Some nodes failed to upload the backup. Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 64, in orchestrate backup.execute(cql_session_provider) File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 150, in execute self._upload_backup() File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 177, in _upload_backup raise Exception(err_msg) Exception: Some nodes failed to upload the backup. [2023-11-30 04:46:23,456] ERROR: Something went wrong! Attempting to clean snapshots and exit.

medusa.ini [cassandra] config_file = /usr/local/cassandra/conf/cassandra.yaml cql_username = ********* cql_password = ******** ; When using the following setting there must be files in: ; -/usernamecontaining username ; -/passwordcontaining password ;cql_k8s_secrets_path = <path to kubernetes secrets folder> ;nodetool_username = <my nodetool username> ;nodetool_password = <my nodetool password> ;nodetool_password_file_path = <path to nodetool password file> ;nodetool_k8s_secrets_path = <path to nodetool kubernetes secrets folder> ;nodetool_host = <host name or IP to use for nodetool> ;nodetool_port = <port number to use for nodetool> ;certfile= <Client SSL: path to rootCa certificate> ;usercert= <Client SSL: path to user certificate> ;userkey= <Client SSL: path to user key> ;certfile= /usr/local/cassandra/conf/.keystore.pub ;usercert= /usr/local/cassandra/conf/.keystore.pub ;userkey= /usr/local/cassandra/conf/.keystore.priv ;validate= false ;sstableloader_ts = <Client SSL: full path to truststore> ;sstableloader_tspw = <Client SSL: password of the truststore> ;sstableloader_ks = <Client SSL: full path to keystore> ;sstableloader_kspw = <Client SSL: password of the keystore> ;sstableloader_bin = <Location of the sstableloader binary if not in PATH> ; Enable this to add the '--ssl' parameter to nodetool. The nodetool-ssl.properties is expected to be in the normal location ;nodetool_ssl = true ; Command ran to verify if Cassandra is running on a node. Defaults to "nodetool version" check_running = nodetool version ; Disable/Enable ip address resolving. ; Disabling this can help when fqdn resolving gives different domain names for local and remote nodes ; which makes backup succeed but Medusa sees them as incomplete. ; Defaults to True. resolve_ip_addresses = False ; When true, almost all commands executed by Medusa are prefixed withsudo. ; Does not affect the use_sudo_for_restore setting in the 'storage' section. ; See https://github.com/thelastpickle/cassandra-medusa/issues/318 ; Defaults to True use_sudo = False [storage] storage_provider = s3 ; storage_provider should be either of "local", "google_storage" or "s3" ;region = <Region hosting the storage> ; Name of the bucket used for storing backups bucket_name = cassandra-backup ; storage_provider should be "s3" ;kms_id = <ARN of KMS key used for server-side bucket encryption> ; JSON key file for service account with access to GCS bucket or AWS credentials file (home-dir/.aws/credentials) ;key_file = /etc/medusa/credentials ; Path of the local storage bucket (used only with 'local' storage provider) ;base_path = /path/to/backups ; Any prefix used for multitenancy in the same bucket ;prefix = cass-dev-va6 ;fqdn = <enforce the name of the local node. Computed automatically if not provided.> fqdn = medusa ; Number of days before backups are purged. 0 means backups dont get purged by age (default) max_backup_age = 0 ; Number of backups to retain. Older backups will get purged beyond that number. 0 means backups dont get purged by count (default) max_backup_count = 0 ; Both thresholds can be defined for backup purge. ; Used to throttle S3 backups/restores: transfer_max_bandwidth = 100MB/s ; Max number of downloads/uploads. Not used by the GCS backend. concurrent_transfers = 1 ; Size over which S3 uploads will be using the awscli with multi part uploads. Defaults to 100MB. multi_part_upload_threshold = 104857600 ; GC grace period for backed up files. Prevents race conditions between purge and running backups backup_grace_period_in_days = 10 ; When not using sstableloader to restore data on a node, Medusa will copy snapshot files from a ; temporary location into the cassandra data directroy. Medusa will then attempt to change the ; ownership of the snapshot files so the cassandra user can access them. ; Depending on how users/file permissions are set up on the cassandra instance, the medusa user ; may need elevated permissions to manipulate the files in the cassandra data directory. ; ; This option does NOT replace theuse_sudooption under the 'cassandra' section! ; See: https://github.com/thelastpickle/cassandra-medusa/pull/399 ; ; Defaults to True use_sudo_for_restore = True ;api_profile = <AWS profile to use> ;host = <Optional object storage host to connect to> ;port = <Optional object storage port to connect to> ; Configures the use of SSL to connect to the object storage system. ;secure = True ;aws_cli_path = <Location of the aws cli binary if not in PATH> [monitoring] ;monitoring_provider = <Provider used for sending metrics. Currently either of "ffwd" or "local"> [ssh] username = centos key_file = /tmp/test-01.pem port = 22 ;cert_file = <Path of public key signed certificate file to use for authentication. The corresponding private key must also be provided via key_file parameter> [checks] ;health_check = <Which ports to check when verifying a node restored properly. Options are 'cql' (default), 'thrift', 'all'.> ;query = <CQL query to run after a restore to verify it went OK> ;expected_rows = <Number of rows expected to be returned when the query runs. Not checked if not specified.> ;expected_result = <Coma separated string representation of values returned by the query. Checks only 1st row returned, and only if specified> ;enable_md5_checks = <During backups and verify, use md5 calculations to determine file integrity (in addition to size, which is used by default)> enable_md5_checks = False [logging] ; Controls file logging, disabled by default. enabled = 1 file = /var/log/medusa.log level = INFO ; Control the log output format format = [%(asctime)s] %(levelname)s: %(message)s ; Size over which log file will rotate maxBytes = 2000000 ; How many log files to keep backupCount = 10 [grpc] ; Set to true when running in grpc server mode. ; Allows to propagate the exceptions instead of exiting the program. ;enabled = False [kubernetes] ; The following settings are only intended to be configured if Medusa is running in containers, preferably in Kubernetes. ;enabled = False ;cassandra_url = <URL of the management API snapshot endpoint. For example: http://127.0.0.1:8080/api/v0/ops/node/snapshots> ; Enables the use of the management API to create snapshots. Falls back to using Jolokia if not enabled. ;use_mgmt_api = True

any suggestions to fix this issue are appreciated, Thanks !

We have modified the permissions for the "medusa-wrapper" executable and it fixed the issue.

Currently, we are facing an issue as "Some nodes failed to upload the backup". Any suggestion to fix this error?

FYI.

/usr/local/bin/medusa backup-cluster --backup-name=full-backup-fg-dev --seed-target=X.X.X.X --mode=full

[2023-12-01 07:22:49,772] INFO: Monitoring provider is noop [2023-12-01 07:22:50,622] INFO: Starting backup full-backup-fg-dev [2023-12-01 07:22:50,939] INFO: Resolving ip address X.X.X.X [2023-12-01 07:22:50,939] INFO: ip address to resolve X.X.X.X [2023-12-01 07:22:50,940] INFO: Resolving ip address X.X.X.X [2023-12-01 07:22:50,940] INFO: ip address to resolve X.X.X.X [2023-12-01 07:22:50,940] INFO: Resolving ip address X.X.X.X [2023-12-01 07:22:50,940] INFO: ip address to resolve X.X.X.X [2023-12-01 07:22:50,940] INFO: Resolving ip address X.X.X.X [2023-12-01 07:22:50,940] INFO: ip address to resolve X.X.X.X [2023-12-01 07:22:51,037] INFO: Creating snapshots on all nodes [2023-12-01 07:22:51,037] INFO: Executing "nodetool -Dcom.sun.jndi.rmiURLParsing=legacy snapshot -t medusa-full-backup-fg-dev" on following nodes ['X.X.X.X', 'X.X.X.X', 'X.X.X.X'] with a parallelism/pool size of 500 [2023-12-01 07:22:54,064] INFO: Job executing "nodetool -Dcom.sun.jndi.rmiURLParsing=legacy snapshot -t medusa-full-backup-fg-dev" ran and finished Successfully on all nodes. [2023-12-01 07:22:54,064] INFO: A snapshot medusa-full-backup-fg-dev was created on all nodes. [2023-12-01 07:22:54,065] INFO: Uploading snapshots from nodes to external storage [2023-12-01 07:22:54,065] INFO: Executing "mkdir -p /tmp/medusa-job-bdee0c05-080a-4e58-90b6-cc7c05738f67; cd /tmp/medusa-job-bdee0c05-080a-4e58-90b6-cc7c05738f67 && medusa-wrapper medusa -vvv backup-node --backup-name full-backup-fg-dev --mode full" on following nodes ['X.X.X.X', 'X.X.X.X', 'X.X.X.X'] with a parallelism/pool size of 1 [2023-12-01 07:22:56,340] ERROR: Job executing "mkdir -p /tmp/medusa-job-bdee0c05-080a-4e58-90b6-cc7c05738f67; cd /tmp/medusa-job-bdee0c05-080a-4e58-90b6-cc7c05738f67 && medusa-wrapper medusa -vvv backup-node --backup-name full-backup-fg-dev --mode full" ran and finished with errors on following nodes: ['X.X.X.X', 'X.X.X.X', 'X.X.X.X'] [2023-12-01 07:22:56,341] ERROR: Some nodes failed to upload the backup. [2023-12-01 07:22:56,341] ERROR: This error happened during the cluster backup: Some nodes failed to upload the backup. Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 64, in orchestrate backup.execute(cql_session_provider) File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 150, in execute self._upload_backup() File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 177, in _upload_backup raise Exception(err_msg) Exception: Some nodes failed to upload the backup. [2023-12-01 07:22:56,342] ERROR: Something went wrong! Attempting to clean snapshots and exit. [2023-12-01 07:22:56,343] INFO: Executing "nodetool -Dcom.sun.jndi.rmiURLParsing=legacy clearsnapshot -t medusa-full-backup-dev" on following nodes ['X.X.X.X', 'X.X.X.X', 'X.X.X.X'] with a parallelism/pool size of 1 [2023-12-01 07:23:02,745] INFO: Job executing "nodetool -Dcom.sun.jndi.rmiURLParsing=legacy clearsnapshot -t medusa-full-backup-dev" ran and finished Successfully on all nodes. [2023-12-01 07:23:02,746] INFO: All nodes successfully cleared their snapshot.

Thanks Nagendra

thelastpickle / cassandra-medusa

medusa backup-cluster failed with medusa-wrapper Permission denied #690