thelastpickle / cassandra-medusa

Apache Cassandra Backup and Restore Tool
Apache License 2.0
264 stars 143 forks source link

Issue when using sstableloader option while internode encryption is enabled on Cassandra #538

Open venkata-naladala opened 2 years ago

venkata-naladala commented 2 years ago

Project board link

Medusa version: 0.13.4 Cassandra version: 4.0.6

One of our customers was having an issue while using sstableloader option with medusa when encryption was enabled on Cassandra. I was testing this out on a single node cluster. I took the backup with the below command and tried to restore it with the below restore command. The restore is failing on the node.

Backup command:

medusa -v backup-cluster --backup-name=backup_20221104_2

Restore command:

medusa -v restore-cluster --backup-name=backup_20221104_2 --seed-target <ip> --temp-dir /tmp/ --use-sstableloader --table keyspace1.standard1

The below is from medusa.log

[2022-11-04 16:25:37,991] DEBUG: Closing channel
[2022-11-04 16:25:37,991] ERROR: Job executing "mkdir -p /tmp/medusa-job-66cd08e7-9a77-42bd-bc85-6308c840ca09; cd /tmp/medusa-job-66cd08e7-9a77-42bd-bc85-6308c840ca09 && medusa-wrapper sudo medusa  --fqdn=%s -vvv restore-node --in-place  %s --no-verify --backup-name backup_20221104_2 --temp-dir /tmp --use-sstableloader  --table keyspace1.standard1" ran and finished with errors on following nodes: ['IP']
[2022-11-04 16:25:37,992] DEBUG: Waiting for stdout reader
[2022-11-04 16:25:37,992] DEBUG: Reading from stdout buffer
[2022-11-04 16:25:37,992] DEBUG: Waiting for stderr reader
[2022-11-04 16:25:37,992] DEBUG: Reading from stderr buffer
[2022-11-04 16:25:37,992] ERROR: Some nodes failed to restore. Exiting 

The below is from the stderr file that was created under tmp/medusa-job directory

[2022-11-04 16:25:31,087] DEBUG: Restoring table roles-5bc52802de2535edaeab188eecebb090 with sstableloader...
[2022-11-04 16:25:31,088] INFO: Resolving ip address 
[2022-11-04 16:25:31,089] INFO: ip address to resolve ip
[2022-11-04 16:25:31,090] DEBUG: Resolved ip to ip-ip.us-west-2.compute.internal
[2022-11-04 16:25:31,090] INFO: SPECIAL LINE STORAGE PORT 7001
WARN  16:25:32,599 Only 32.601GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
ERROR 16:25:37,191 [Stream #50143950-5c5d-11ed-adb4-8bde45d884f0] Streaming error occurred on session with peer ip:7000
java.io.IOException: failed to connect to /ip:7000 for streaming data
    at org.apache.cassandra.streaming.DefaultConnectionFactory.createConnection(DefaultConnectionFactory.java:55)
    at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:50)
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.createChannel(NettyStreamingMessageSender.java:199)
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.setupControlMessageChannel(NettyStreamingMessageSender.java:180)
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:248)
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.initialize(NettyStreamingMessageSender.java:149)
    at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:363)
    at org.apache.cassandra.streaming.StreamCoordinator.startSession(StreamCoordinator.java:263)
    at org.apache.cassandra.streaming.StreamCoordinator.access$700(StreamCoordinator.java:37)
    at org.apache.cassandra.streaming.StreamCoordinator$HostStreamingData.connectAllStreamSessions(StreamCoordinator.java:309)
    at org.apache.cassandra.streaming.StreamCoordinator.connectAllStreamSessions(StreamCoordinator.java:108)
    at org.apache.cassandra.streaming.StreamCoordinator.connect(StreamCoordinator.java:102)
    at org.apache.cassandra.streaming.StreamResultFuture.createInitiator(StreamResultFuture.java:98)
    at org.apache.cassandra.streaming.StreamPlan.execute(StreamPlan.java:179)
    at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:220)
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:78)
    at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:51)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /ip:7001
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
    at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
    at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650)
    at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530)
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:750)
WARN  16:25:37,201 [Stream #50143950-5c5d-11ed-adb4-8bde45d884f0] Stream failed
ERROR 16:25:37,464 [Stream #50143950-5c5d-11ed-adb4-8bde45d884f0] Streaming error occurred on session with peer ip:7000
java.lang.RuntimeException: stream has been closed, cannot send Prepare SYN (0 requests,  14 files}
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:231)
    at org.apache.cassandra.streaming.StreamSession.onInitializationComplete(StreamSession.java:624)
    at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:364)
    at org.apache.cassandra.streaming.StreamCoordinator.startSession(StreamCoordinator.java:263)
    at org.apache.cassandra.streaming.StreamCoordinator.access$700(StreamCoordinator.java:37)
    at org.apache.cassandra.streaming.StreamCoordinator$HostStreamingData.connectAllStreamSessions(StreamCoordinator.java:309)
    at org.apache.cassandra.streaming.StreamCoordinator.connectAllStreamSessions(StreamCoordinator.java:108)
    at org.apache.cassandra.streaming.StreamCoordinator.connect(StreamCoordinator.java:102)
    at org.apache.cassandra.streaming.StreamResultFuture.createInitiator(StreamResultFuture.java:98)
    at org.apache.cassandra.streaming.StreamPlan.execute(StreamPlan.java:179)
    at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:220)
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:78)
    at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:51)
Streaming to the following hosts failed:
[/ip:7000]
java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
    at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:100)
    at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:51)
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
    at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
    at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
    at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
    at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
    at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
    at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:221)
    at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:197)
    at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:507)
    at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:662)
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:254)
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.initialize(NettyStreamingMessageSender.java:149)
    at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:363)
    at org.apache.cassandra.streaming.StreamCoordinator.startSession(StreamCoordinator.java:263)
    at org.apache.cassandra.streaming.StreamCoordinator.access$700(StreamCoordinator.java:37)
    at org.apache.cassandra.streaming.StreamCoordinator$HostStreamingData.connectAllStreamSessions(StreamCoordinator.java:309)
    at org.apache.cassandra.streaming.StreamCoordinator.connectAllStreamSessions(StreamCoordinator.java:108)
    at org.apache.cassandra.streaming.StreamCoordinator.connect(StreamCoordinator.java:102)
    at org.apache.cassandra.streaming.StreamResultFuture.createInitiator(StreamResultFuture.java:98)
    at org.apache.cassandra.streaming.StreamPlan.execute(StreamPlan.java:179)
    at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:220)
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:78)
    ... 1 more
Exception in thread "main" org.apache.cassandra.tools.BulkLoadException: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:115)
    at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:51)
Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
    at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:552)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:513)
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:100)
    ... 1 more
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
    at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:88)
    at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
    at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
    at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
    at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
    at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:221)
    at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:197)
    at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:507)
    at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:662)
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.sendMessage(NettyStreamingMessageSender.java:254)
    at org.apache.cassandra.streaming.async.NettyStreamingMessageSender.initialize(NettyStreamingMessageSender.java:149)
    at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:363)
    at org.apache.cassandra.streaming.StreamCoordinator.startSession(StreamCoordinator.java:263)
    at org.apache.cassandra.streaming.StreamCoordinator.access$700(StreamCoordinator.java:37)
    at org.apache.cassandra.streaming.StreamCoordinator$HostStreamingData.connectAllStreamSessions(StreamCoordinator.java:309)
    at org.apache.cassandra.streaming.StreamCoordinator.connectAllStreamSessions(StreamCoordinator.java:108)
    at org.apache.cassandra.streaming.StreamCoordinator.connect(StreamCoordinator.java:102)
    at org.apache.cassandra.streaming.StreamResultFuture.createInitiator(StreamResultFuture.java:98)
    at org.apache.cassandra.streaming.StreamPlan.execute(StreamPlan.java:179)
    at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:220)
    at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:78)
    ... 1 more
Traceback (most recent call last):
  File "/usr/bin/medusa", line 11, in <module>
    load_entry_point('cassandra-medusa==0.13.4', 'console_scripts', 'medusa')()
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/click/decorators.py", line 84, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/medusa/medusacli.py", line 271, in restore_node
    verify, set(keyspaces), set(tables), use_sstableloader, version_target)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/medusa/restore_node.py", line 54, in restore_node
    keyspaces, tables)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/medusa/restore_node.py", line 181, in restore_node_sstableloader
    cassandra.native_port)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/medusa/restore_node.py", line 233, in invoke_sstableloader
    output = subprocess.check_output(sstableloader_args)
  File "/usr/share/cassandra-medusa/lib/python3.6/site-packages/gevent/subprocess.py", line 405, in check_output
    raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command '['sstableloader', '-d', 'ip-ip.us-west-2.compute.internal', '--conf-path', '/etc/cassandra/cassandra.yaml', '--username', 'cassandra', '--password', 'cassandra', '--no-progress', '--port', '9042', '/tmp/medusa-restore-8a387947-34ba-4d2e-a9c9-7da172ffb84d/system_auth/roles-5bc52802de2535edaeab188eecebb090', '--storage-port', '7001']' returned non-zero exit status 1.

I have tried with sstableloader utility it works with port 7000 (default) and fails if port 7001 was used.

sstableloader -d `hostname` --storage-port 7000 -u cassandra -pw cassandra  keyspace1/standard1    sstableloader -d `hostname` --storage-port 7001 -u cassandra -pw cassandra  keyspace1/standard1So the issue is that medusa is picking up port 7001 even though it was commented out in cassandra.yaml.

In the below code section, self._release_version is None so it is skipping the condition and defaulting to 7001. I wasn't able to figure out where the value is set for self._release_version.

https://github.com/thelastpickle/cassandra-medusa/blob/6f436885e3e06330afaf748564a11b168f6331c5/medusa/cassandra_utils.py#L260

            else:
                # ssl_storage_port not specified, and found a version of c* 4+
                if self._release_version is not None and Version(self._release_version) >= Version('4-a') and \
                        'storage_port' in self._config and self._config['storage_port'] is not None:
                    return self._config['storage_port']
                return "7001"

I have hardcoded the version as 4.0.6 like below and it works as expected.

           else:
                # ssl_storage_port not specified, and found a version of c* 4+
                self._release_version = '4.0.6'
                if self._release_version is not None and Version(self._release_version) >= Version('4-a') and \
                        'storage_port' in self._config and self._config['storage_port'] is not None:
                    return self._config['storage_port']
                return "7001"

One workaround was setting both storage_port and ssl_storage_port to 7000 in cassandra.yaml file and the restore command work successfully as it is using the 7000 port.

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: MED-38

rzvoncek commented 7 months ago

There used to be a bug in the sstableloader that made it not work when encryption was enabled. The solution to that was to bump (~manually replace a jar) something that I meanwhile forgot about. However, I'm not sure how that would help here.