Closed nagcassandra closed 7 months ago
Attached is the medusa.ini file for the reference.
We have modified the permissions for the "medusa-wrapper" executable and it fixed the issue.
Currently, we are facing an issue as "Some nodes failed to upload the backup". Any suggestion to fix this error?
FYI.
[2023-12-01 07:22:49,772] INFO: Monitoring provider is noop [2023-12-01 07:22:50,622] INFO: Starting backup full-backup-fg-dev [2023-12-01 07:22:50,939] INFO: Resolving ip address X.X.X.X [2023-12-01 07:22:50,939] INFO: ip address to resolve X.X.X.X [2023-12-01 07:22:50,940] INFO: Resolving ip address X.X.X.X [2023-12-01 07:22:50,940] INFO: ip address to resolve X.X.X.X [2023-12-01 07:22:50,940] INFO: Resolving ip address X.X.X.X [2023-12-01 07:22:50,940] INFO: ip address to resolve X.X.X.X [2023-12-01 07:22:50,940] INFO: Resolving ip address X.X.X.X [2023-12-01 07:22:50,940] INFO: ip address to resolve X.X.X.X [2023-12-01 07:22:51,037] INFO: Creating snapshots on all nodes [2023-12-01 07:22:51,037] INFO: Executing "nodetool -Dcom.sun.jndi.rmiURLParsing=legacy snapshot -t medusa-full-backup-fg-dev" on following nodes ['X.X.X.X', 'X.X.X.X', 'X.X.X.X'] with a parallelism/pool size of 500 [2023-12-01 07:22:54,064] INFO: Job executing "nodetool -Dcom.sun.jndi.rmiURLParsing=legacy snapshot -t medusa-full-backup-fg-dev" ran and finished Successfully on all nodes. [2023-12-01 07:22:54,064] INFO: A snapshot medusa-full-backup-fg-dev was created on all nodes. [2023-12-01 07:22:54,065] INFO: Uploading snapshots from nodes to external storage [2023-12-01 07:22:54,065] INFO: Executing "mkdir -p /tmp/medusa-job-bdee0c05-080a-4e58-90b6-cc7c05738f67; cd /tmp/medusa-job-bdee0c05-080a-4e58-90b6-cc7c05738f67 && medusa-wrapper medusa -vvv backup-node --backup-name full-backup-fg-dev --mode full" on following nodes ['X.X.X.X', 'X.X.X.X', 'X.X.X.X'] with a parallelism/pool size of 1 [2023-12-01 07:22:56,340] ERROR: Job executing "mkdir -p /tmp/medusa-job-bdee0c05-080a-4e58-90b6-cc7c05738f67; cd /tmp/medusa-job-bdee0c05-080a-4e58-90b6-cc7c05738f67 && medusa-wrapper medusa -vvv backup-node --backup-name full-backup-fg-dev --mode full" ran and finished with errors on following nodes: ['X.X.X.X', 'X.X.X.X', 'X.X.X.X'] [2023-12-01 07:22:56,341] ERROR: Some nodes failed to upload the backup. [2023-12-01 07:22:56,341] ERROR: This error happened during the cluster backup: Some nodes failed to upload the backup. Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 64, in orchestrate backup.execute(cql_session_provider) File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 150, in execute self._upload_backup() File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 177, in _upload_backup raise Exception(err_msg) Exception: Some nodes failed to upload the backup. [2023-12-01 07:22:56,342] ERROR: Something went wrong! Attempting to clean snapshots and exit. [2023-12-01 07:22:56,343] INFO: Executing "nodetool -Dcom.sun.jndi.rmiURLParsing=legacy clearsnapshot -t medusa-full-backup-dev" on following nodes ['X.X.X.X', 'X.X.X.X', 'X.X.X.X'] with a parallelism/pool size of 1 [2023-12-01 07:23:02,745] INFO: Job executing "nodetool -Dcom.sun.jndi.rmiURLParsing=legacy clearsnapshot -t medusa-full-backup-dev" ran and finished Successfully on all nodes. [2023-12-01 07:23:02,746] INFO: All nodes successfully cleared their snapshot.
Thanks Nagendra
Hi @nagcassandra . To troubleshoot this further, we'd need to see the Medusa logs on individual nodes. They are somewhere in /tmp
. If you're still struggling with this, please share them.
Project board link
Hello Team,
As part of POC, we are evaluating the Medusa Backup and Restore tool for the Apache Cassandra Cluster.
Currently, we are experiencing an issue with "medusa backup-cluster". We are getting the Permission denied error for /usr/local/bin/medusa-wrapper executable.
My environment:
/usr/local/bin/medusa --version
0.15.0
cassandra -v
3.11.13
python --version
Python 2.7.5
python3 --version
Python 3.6.8
java -version
openjdk version "1.8.0_382" OpenJDK Runtime Environment (build 1.8.0_382-b05) OpenJDK 64-Bit Server VM (build 25.382-b05, mixed mode)
`[2023-11-30 04:46:23,454] INFO: [10.66.231.208] [err] bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,454] INFO: 10.66.231.208-stderr: bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,454] INFO: [10.66.231.212] [err] bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,454] INFO: 10.66.231.212-stderr: bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,455] INFO: [10.66.231.248] [err] bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,455] INFO: 10.66.231.248-stderr: bash: /usr/local/bin/medusa-wrapper: Permission denied [2023-11-30 04:46:23,455] ERROR: Some nodes failed to upload the backup. [2023-11-30 04:46:23,455] ERROR: This error happened during the cluster backup: Some nodes failed to upload the backup. Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 64, in orchestrate backup.execute(cql_session_provider) File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 150, in execute self._upload_backup() File "/usr/local/lib/python3.6/site-packages/medusa/backup_cluster.py", line 177, in _upload_backup raise Exception(err_msg) Exception: Some nodes failed to upload the backup. [2023-11-30 04:46:23,456] ERROR: Something went wrong! Attempting to clean snapshots and exit.
`
medusa.ini/username/password
[cassandra] config_file = /usr/local/cassandra/conf/cassandra.yaml cql_username = ********* cql_password = ******** ; When using the following setting there must be files in: ; -
containing username ; -
containing password ;cql_k8s_secrets_path = <path to kubernetes secrets folder> ;nodetool_username = <my nodetool username> ;nodetool_password = <my nodetool password> ;nodetool_password_file_path = <path to nodetool password file> ;nodetool_k8s_secrets_path = <path to nodetool kubernetes secrets folder> ;nodetool_host = <host name or IP to use for nodetool> ;nodetool_port = <port number to use for nodetool> ;certfile= <Client SSL: path to rootCa certificate> ;usercert= <Client SSL: path to user certificate> ;userkey= <Client SSL: path to user key> ;certfile= /usr/local/cassandra/conf/.keystore.pub ;usercert= /usr/local/cassandra/conf/.keystore.pub ;userkey= /usr/local/cassandra/conf/.keystore.priv ;validate= false ;sstableloader_ts = <Client SSL: full path to truststore> ;sstableloader_tspw = <Client SSL: password of the truststore> ;sstableloader_ks = <Client SSL: full path to keystore> ;sstableloader_kspw = <Client SSL: password of the keystore> ;sstableloader_bin = <Location of the sstableloader binary if not in PATH> ; Enable this to add the '--ssl' parameter to nodetool. The nodetool-ssl.properties is expected to be in the normal location ;nodetool_ssl = true ; Command ran to verify if Cassandra is running on a node. Defaults to "nodetool version" check_running = nodetool version ; Disable/Enable ip address resolving. ; Disabling this can help when fqdn resolving gives different domain names for local and remote nodes ; which makes backup succeed but Medusa sees them as incomplete. ; Defaults to True. resolve_ip_addresses = False ; When true, almost all commands executed by Medusa are prefixed with
sudo. ; Does not affect the use_sudo_for_restore setting in the 'storage' section. ; See https://github.com/thelastpickle/cassandra-medusa/issues/318 ; Defaults to True use_sudo = False [storage] storage_provider = s3 ; storage_provider should be either of "local", "google_storage" or "s3" ;region = <Region hosting the storage> ; Name of the bucket used for storing backups bucket_name = cassandra-backup ; storage_provider should be "s3" ;kms_id = <ARN of KMS key used for server-side bucket encryption> ; JSON key file for service account with access to GCS bucket or AWS credentials file (home-dir/.aws/credentials) ;key_file = /etc/medusa/credentials ; Path of the local storage bucket (used only with 'local' storage provider) ;base_path = /path/to/backups ; Any prefix used for multitenancy in the same bucket ;prefix = cass-dev-va6 ;fqdn = <enforce the name of the local node. Computed automatically if not provided.> fqdn = medusa ; Number of days before backups are purged. 0 means backups dont get purged by age (default) max_backup_age = 0 ; Number of backups to retain. Older backups will get purged beyond that number. 0 means backups dont get purged by count (default) max_backup_count = 0 ; Both thresholds can be defined for backup purge. ; Used to throttle S3 backups/restores: transfer_max_bandwidth = 100MB/s ; Max number of downloads/uploads. Not used by the GCS backend. concurrent_transfers = 1 ; Size over which S3 uploads will be using the awscli with multi part uploads. Defaults to 100MB. multi_part_upload_threshold = 104857600 ; GC grace period for backed up files. Prevents race conditions between purge and running backups backup_grace_period_in_days = 10 ; When not using sstableloader to restore data on a node, Medusa will copy snapshot files from a ; temporary location into the cassandra data directroy. Medusa will then attempt to change the ; ownership of the snapshot files so the cassandra user can access them. ; Depending on how users/file permissions are set up on the cassandra instance, the medusa user ; may need elevated permissions to manipulate the files in the cassandra data directory. ; ; This option does NOT replace the
use_sudooption under the 'cassandra' section! ; See: https://github.com/thelastpickle/cassandra-medusa/pull/399 ; ; Defaults to True use_sudo_for_restore = True ;api_profile = <AWS profile to use> ;host = <Optional object storage host to connect to> ;port = <Optional object storage port to connect to> ; Configures the use of SSL to connect to the object storage system. ;secure = True ;aws_cli_path = <Location of the aws cli binary if not in PATH> [monitoring] ;monitoring_provider = <Provider used for sending metrics. Currently either of "ffwd" or "local"> [ssh] username = centos key_file = /tmp/test-01.pem port = 22 ;cert_file = <Path of public key signed certificate file to use for authentication. The corresponding private key must also be provided via key_file parameter> [checks] ;health_check = <Which ports to check when verifying a node restored properly. Options are 'cql' (default), 'thrift', 'all'.> ;query = <CQL query to run after a restore to verify it went OK> ;expected_rows = <Number of rows expected to be returned when the query runs. Not checked if not specified.> ;expected_result = <Coma separated string representation of values returned by the query. Checks only 1st row returned, and only if specified> ;enable_md5_checks = <During backups and verify, use md5 calculations to determine file integrity (in addition to size, which is used by default)> enable_md5_checks = False [logging] ; Controls file logging, disabled by default. enabled = 1 file = /var/log/medusa.log level = INFO ; Control the log output format format = [%(asctime)s] %(levelname)s: %(message)s ; Size over which log file will rotate maxBytes = 2000000 ; How many log files to keep backupCount = 10 [grpc] ; Set to true when running in grpc server mode. ; Allows to propagate the exceptions instead of exiting the program. ;enabled = False [kubernetes] ; The following settings are only intended to be configured if Medusa is running in containers, preferably in Kubernetes. ;enabled = False ;cassandra_url = <URL of the management API snapshot endpoint. For example: http://127.0.0.1:8080/api/v0/ops/node/snapshots> ; Enables the use of the management API to create snapshots. Falls back to using Jolokia if not enabled. ;use_mgmt_api = True
any suggestions to fix this issue are appreciated, Thanks !