thelastpickle / cassandra-medusa

Apache Cassandra Backup and Restore Tool
Apache License 2.0
266 stars 143 forks source link

restore-node - This operation would block forever #811

Open chrisjmiller1 opened 1 month ago

chrisjmiller1 commented 1 month ago

Project board link

Hi,

I'm testing restore from medusa on a simple cluster and getting the following error for both medusa 0.15 and 0.21.

Any ideas how to resolve? Traceback (most recent call last): File "/usr/local/bin/medusa", line 8, in <module> sys.exit(cli()) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1128, in **call** return self.main(_args, *_kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 754, in invoke return __callback(_args, *_kwargs) File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 84, in new_func return ctx.invoke(f, obj, _args, *_kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 754, in invoke return __callback(_args, *_kwargs) File "/usr/local/lib/python3.9/site-packages/medusa/medusacli.py", line 273, in restore_node medusa.restore_node.restore_node(medusaconfig, Path(temp_dir), backup_name, in_place, keep_auth, seeds, File "/usr/local/lib/python3.9/site-packages/medusa/restore_node.py", line 50, in restore_node restore_node_locally(config, temp_dir, backup_name, in_place, keep_auth, seeds, storage, File "/usr/local/lib/python3.9/site-packages/medusa/restore_node.py", line 91, in restore_node_locally download_data(config.storage, node_backup, fqtns_to_restore, destination=download_dir) File "/usr/local/lib/python3.9/site-packages/medusa/download.py", line 52, in download_data storage.storage_driver.download_blobs(src_batch, dst) File "/usr/local/lib/python3.9/site-packages/medusa/storage/s3_base_storage.py", line 122, in download_blobs return medusa.storage.s3_compat_storage.concurrent.download_blobs( File "/usr/local/lib/python3.9/site-packages/medusa/storage/s3_compat_storage/concurrent.py", line 166, in download_blobs job.execute(list(src)) File "/usr/local/lib/python3.9/site-packages/medusa/storage/s3_compat_storage/concurrent.py", line 54, in execute return list(executor.map(self.with_storage, iterables)) File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 598, in map fs = [self.submit(fn, _args) for args in zip(_iterables)] File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 598, in <listcomp> fs = [self.submit(fn, _args) for args in zip(_iterables)] File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 176, in submit self._adjust_thread_count() File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 182, in _adjust_thread_count if self._idle_semaphore.acquire(timeout=0): File "/usr/lib64/python3.9/threading.py", line 450, in acquire self._cond.wait(timeout) File "/usr/lib64/python3.9/threading.py", line 318, in wait gotit = waiter.acquire(False) File "/usr/local/lib64/python3.9/site-packages/gevent/thread.py", line 132, in acquire sleep() File "/usr/local/lib64/python3.9/site-packages/gevent/hub.py", line 159, in sleep waiter.get() File "src/gevent/_waiter.py", line 143, in gevent._gevent_c_waiter.Waiter.get File "src/gevent/_waiter.py", line 154, in gevent._gevent_c_waiter.Waiter.get File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch gevent.exceptions.LoopExit: **This operation would block foreve**r Hub: <Hub '' at 0x7f9be30bdef0 epoll default pending=0 ref=0 fileno=3 resolver=<gevent.resolver.thread.Resolver at 0x7f9be058ba60 pool=<ThreadPool at 0x7f9be307c970 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7f9be30bdef0 thread_ident=0x7f9bed6a4740>>> threadpool=<ThreadPool at 0x7f9be307c970 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7f9be30bdef0 thread_ident=0x7f9bed6a4740>> thread_ident=0x7f9bed6a4740> Handles: []

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: MED-99

chrisjmiller1 commented 1 week ago

Hi @adejanovski - just wondering how this issue is progressing? Is there a workaround that I could implement? Thanks. Chris.

adejanovski commented 1 week ago

I've never seen this error, and our tests are passing so I assume there's something environmental at play here. Could you provide more info on your setup (OS, version, etc...) and the sequence of commands to reproduce the issue?

chrisjmiller1 commented 1 week ago

OS is Rocky Linux 8.7 Backup command: nohup medusa backup --backup-name=backup1 & Restore command: nohup sudo medusa restore-node --backup-name=backup1 &

medusa list-backups [2024-11-15 15:59:16,570] INFO: Resolving ip address [2024-11-15 15:59:16,571] INFO: ip address to resolve XX [2024-11-15 15:59:16,583] INFO: Found credentials in shared credentials file: /etc/medusa/credentials backup1 (started: 2024-11-15 10:16:52, finished: 2024-11-15 10:22:39) backup2 (started: 2024-11-15 10:22:58, finished: 2024-11-15 10:23:46)

chrisjmiller1 commented 1 week ago

Hi @adejanovski, do you have any feedback regarding my last update? Thanks.