Backup taken with globalRequest enabled cannot be restored

cin commented 4 years ago

Hopefully I'm just doing something wrong here. I've tried it several ways to no avail though. I can get the backups to work just fine. They look correct in object storage too. If I just backup a single node, I can restore it w/no issues as well. However if I try to restore (even a single node) from a backup with globalRequest: true, it fails with the following error (which I formatted for ease of reading).

There is not one key which satisfies key filter: [
  S3ObjectSummary{
    bucketName='my-bucket', 
    key='cassandra/dc1/792aa697-ca02-4b44-83c0-1dc6c48ef47f/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', 
    eTag='ab34f1c0e87d136f503de0b238049575', 
    size=5158, 
    lastModified=Tue Jul 14 05:44:18 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  },
  S3ObjectSummary{
    bucketName='my-bucket', 
    key='cassandra/dc1/99fe4904-91ad-402a-bc0d-6acbb31def99/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec',
    eTag='68a294ccd1abd9104dbbb5e88f451bdc',
    size=1971,
    lastModified=Tue Jul 14 05:44:18 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }, 
  S3ObjectSummary{
    bucketName='my-bucket', 
    key='cassandra/dc1/a0eb1f45-5f0f-4589-9919-c1d32de2c68d/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', 
    eTag='f88d981fad23a80184d421372eb3b21c', 
    size=6100, 
    lastModified=Tue Jul 14 05:44:18 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }, 
  S3ObjectSummary{
    bucketName='my-bucket',
    key='cassandra/dc2/3852f40d-8bfe-4bc2-b529-3aba699b44d7/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec',
    eTag='90a50752c6813e9d6910d09f2dcbf735', 
    size=4198, lastModified=Tue Jul 14 05:44:17 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }, 
  S3ObjectSummary{
    bucketName='my-bucket', 
    key='cassandra/dc2/b5cab591-bee5-4649-ac8d-ad80cef79287/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', 
    eTag='c027f679c82e29a21cdffe3f4dd10504', 
    size=1754, 
    lastModified=Tue Jul 14 05:44:17 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }, 
  S3ObjectSummary{
    bucketName='my-bucket',
    key='cassandra/dc2/bcc2d49c-0416-40ff-b00f-ae208513ceeb/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', 
    eTag='39babb109aa669aee29eaf0cb21c1aba', 
    size=5149, lastModified=Tue Jul 14 05:44:17 UTC 2020, 
    storageClass='STANDARD', 
    owner=S3Owner [
      name=6f862b8a-db99-4820-93f5-0b1251d0daca,
      id=6f862b8a-db99-4820-93f5-0b1251d0daca
    ]
  }
]

So you can see that it's picking out 6 directories/nodes that I have backed up (for testing, I just setup a 2 DC cluster with 3 nodes per DC).

Here's the POST body:

        {
          "type": "restore",
          "cassandraDirectory": "/var/lib/cassandra",
          "cassandraConfigDirectory": "/etc/cassandra",
          "storageLocation": "s3://my-bucket",
          "snapshotTag": "hello",
          "k8sNamespace": "craig",
          "k8sSecretName": "c-backup-secret",
          "entities": "system_auth",
          "restorationStrategyType": "import",
          "updateCassandraYaml": true,
          "restorationPhase": "download",
          "import": {
            "type": "import",
            "sourceDir": "/var/lib/cassandra/data/downloadedsstables"
          },
          "globalRequest": false
        }

I've also tried it from the command line with similar results:

java -jar /cassandra-sidecar.jar \
backup-restore \
restore  \
--jmx-service=service:jmx:rmi://127.0.0.1/jndi/rmi://127.0.0.1:7199/jmxrmi \
--jmx-user=xxx \
--jmx-password=xxx \
--k8s-namespace=craig \
--k8s-secret-name=c-backup-secret \
--data-directory=/var/lib/cassandra \
--storage-location=s3://my-bucket \
--snapshot-tag=hello \
--restoration-strategy-type=import \
--restoration-phase-type=download \
--import-source-dir=/var/lib/cassandra/data/downloadedsstables

Is there a setting or something that I'm missing here?

EDIT: Here's the backup POST body that I'm using:

{
    "type": "backup",
    "storageLocation": "s3://my-bucket",
    "snapshotTag": "hello",
    "entities": "system_auth",
    "k8sNamespace": "craig",
    "k8sSecretName": "c-backup-secret",
    "dataDirectory": "/var/lib/cassandra",
    "globalRequest": true
}

smiklosovic commented 4 years ago

Hi @cin ,

I am sorry you are experiencing this as we do not have docs fully written yet for 2.0.0-alpha versions so the usage might be a little bit frustrating. We will cut 2.0 soon with all docs etc.

However, the issue you are hitting is that if you backup globally, you have 6 backups, each per a node. If you want to restore, you have to recognize two scenarios:

1) you want to restore a set of tables / keyspaces on a running cluster 2) you want to restore whole cluster from scratch meaning you are starting with completely empty cluster (no node started at all) and whole cluster would be brought up back.

Point 2) is implemented only locally (I am working on that as I write this) and it works very well in connection with our operator. I do not want to go into that rabbit hole to explain you how 2) works in detail as it is fairly complicated to explain but just keep in mind that by doing so, you might recreate your whole cluster from a backup.

So, for 1), globalRequest should be set to "true" in your restore request and storageLocation might be as you have it. If you set globalRequst to "false" as you did and you leave storageLocation same, it probably just does not know which node to take a manifest from because there is just a lot of them - each per node - that is what that message is about "There is not one key which satisfies key filter" - it expects exactly one key, but it returned a lot of them. So your request has to be narrowed down.

I wonder why would one want to backup with globalRequet: true but restore with globalRequest: false - something fishy is going on there - you want your cluster to be restored in its entirety for a table / keyspace. Why are you doing this?

However, if you trully want to proceed, you would have to have globalRequest: false, storageLocation: s3://my-bucket/cluster-name/dc-name/node-id.

Anyway, your example is very strange because I do not think you might restore system_auth on a running node. You are using "restorationStrategyType" equal to "import" which means that it will import downloaded SSTables via JMX call but this table is truncated first - which is a cluster-wide operation - so you would basically truncate your whole system_auth which is not a good idea imho but you can try what happens :)

If you omit "restorationStrategyType", it will use so called "InPlaceRestorationStrategy" which just deletes SSTables which should not be there and copies over SSTables which should but you can not do this on a running cluster. That strategy checks that it can not connect to Cassandra node which is a sign that Cassandra node is down so it may proceed with deleting these SSTables directly from the disk and replacing them with downloaded files.

cin commented 4 years ago

Thanks for the detailed reply. No worries about the docs as it's totally expected given that it's alpha. On the bright side, it's been really easy to find what I'm looking for in the code. :) It did take me a minute to figure out how to restore in general as I didn't realize it requires two calls (download and import). Thankfully you have tests that showed me the way. 💯

Also, I'm only using system_auth for testing since it's small and is using NetworkTopologyStrategy. You're definitely right in that it's not a table you'd even backup, let alone restore, under normal circumstances. I'll switch over to using a non-system table and C* stress to generate some data.

Initially the only reason I set globalRequest to false was bc it didn't work with true. ;) So I thought to remove the coordination and try to simplify. I actually tried it with the storageLocation: s3://my-bucket/cluster-name/dc-name/node-id got the same error.

[pool-1-thread-2] ERROR com.instaclustr.cassandra.backup.impl.restore.RestorationPhase$DownloadingPhase - Downloading phase has failed: There is not one key which satisfies key filter: [S3ObjectSummary{bucketName='my-bucket', key='cassandra/dc1/792aa697-ca02-4b44-83c0-1dc6c48ef47f/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', eTag='ab34f1c0e87d136f503de0b238049575', size=5158, lastModified=Tue Jul 14 07:29:22 UTC 2020, storageClass='STANDARD', owner=S3Owner [name=6f862b8a-db99-4820-93f5-0b1251d0daca,id=6f862b8a-db99-4820-93f5-0b1251d0daca]}, S3ObjectSummary{bucketName='my-bucket', key='cassandra/dc1/99fe4904-91ad-402a-bc0d-6acbb31def99/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', eTag='68a294ccd1abd9104dbbb5e88f451bdc', size=1971, lastModified=Tue Jul 14 07:29:21 UTC 2020, storageClass='STANDARD', owner=S3Owner [name=6f862b8a-db99-4820-93f5-0b1251d0daca,id=6f862b8a-db99-4820-93f5-0b1251d0daca]}, S3ObjectSummary{bucketName='my-bucket', key='cassandra/dc1/a0eb1f45-5f0f-4589-9919-c1d32de2c68d/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', eTag='f88d981fad23a80184d421372eb3b21c', size=6100, lastModified=Tue Jul 14 07:29:22 UTC 2020, storageClass='STANDARD', owner=S3Owner [name=6f862b8a-db99-4820-93f5-0b1251d0daca,id=6f862b8a-db99-4820-93f5-0b1251d0daca]}, S3ObjectSummary{bucketName='my-bucket', key='cassandra/dc2/3852f40d-8bfe-4bc2-b529-3aba699b44d7/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', eTag='90a50752c6813e9d6910d09f2dcbf735', size=4198, lastModified=Tue Jul 14 07:29:22 UTC 2020, storageClass='STANDARD', owner=S3Owner [name=6f862b8a-db99-4820-93f5-0b1251d0daca,id=6f862b8a-db99-4820-93f5-0b1251d0daca]}, S3ObjectSummary{bucketName='my-bucket', key='cassandra/dc2/b5cab591-bee5-4649-ac8d-ad80cef79287/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', eTag='c027f679c82e29a21cdffe3f4dd10504', size=1754, lastModified=Tue Jul 14 07:29:21 UTC 2020, storageClass='STANDARD', owner=S3Owner [name=6f862b8a-db99-4820-93f5-0b1251d0daca,id=6f862b8a-db99-4820-93f5-0b1251d0daca]}, S3ObjectSummary{bucketName='my-bucket', key='cassandra/dc2/bcc2d49c-0416-40ff-b00f-ae208513ceeb/manifests/hello-3de05c49-3c0d-39b1-918a-b415379f02ec', eTag='39babb109aa669aee29eaf0cb21c1aba', size=5149, lastModified=Tue Jul 14 07:29:22 UTC 2020, storageClass='STANDARD', owner=S3Owner [name=6f862b8a-db99-4820-93f5-0b1251d0daca,id=6f862b8a-db99-4820-93f5-0b1251d0daca]}]
[pool-1-thread-2] ERROR com.instaclustr.operations.Operation - Operation 839730fa-858d-44de-a667-1079e9590b25 has failed.
com.instaclustr.operations.OperationCoordinator$OperationCoordinatorException: [ResultEntry{failed=false, operation='RestoreOperation{id=839730fa-858d-44de-a667-1079e9590b25, creationTime=2020-07-14T14:52:36.820Z, request=RestoreOperationRequest{storageLocation=StorageLocation{rawLocation=s3://my-bucket/cassandra/dc1/99fe4904-91ad-402a-bc0d-6acbb31def99, storageProvider=s3, bucket=my-bucket, clusterId=cassandra, datacenterId=dc1, nodeId=99fe4904-91ad-402a-bc0d-6acbb31def99, fileBackupDirectory=null, cloudLocation=true}, concurrentConnections=10, cassandraDirectory=/var/lib/cassandra, restoreSystemKeyspace=false, snapshotTag=hello, entities=DatabaseEntities{keyspaces=[system_auth], keyspacesAndTables={}}, restorationStrategyType=IMPORT, restorationPhase=DOWNLOAD, import=ImportOperationRequest{keyspace=null, table=null, keepLevel=false, noVerify=false, noVerifyTokens=false, noInvalidateCaches=false, quick=false, extendedVerify=false, sourceDir=/var/lib/cassandra/data/downloadedsstables}, noDeleteTruncates=false, noDeleteDownloads=false, noDownloadData=false, schemaVersion=null, exactSchemaVersion=false, updateCassandraYaml=true, k8sNamespace=craig, k8sSecretName=c-backup-secret, globalRequest=false}, state=RUNNING, failureCause=null, progress=0.0, startTime=2020-07-14T14:52:36.820Z, shouldCancel=false}', exceptionMessage='com.instaclustr.cassandra.backup.impl.restore.RestorationPhase$RestorationPhaseException: Unable to pass DOWNLOAD phase.'}]
        at com.instaclustr.cassandra.backup.impl.restore.RestoreOperation.run0(RestoreOperation.java:115)
        at com.instaclustr.operations.Operation.run(Operation.java:100)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

I do think there's a valid use case for this even though I was just trying it for testing. Generally, you'd do entire cluster backups daily, weekly, or w/e. However, you could have a node/persistent volume (or even entire zone) die and only want to restore the keyspaces/tables for that specific node (or nodes). This will afford you the luxury of not having to stream all the data to the new node(s). I'm not familiar enough with the backup strategy yet to understand why a UUID (which is consistent between backups at least) is used as opposed to the node name. So I'm not sure how you'd know what data belongs to what node when backing up an entire cluster. Maybe this isn't a use case your team needs to support? Do you just stream the data over if you have a node or subset of nodes die?

Again, thanks for the time helping sort this out!

smiklosovic commented 4 years ago

To answer your question about the restoration, I ve just finished this, there will be a topology file where this mapping will be done, like this:

{
  "topology" : [ {
    "hostname" : "cassandra-test-cluster-dc1-west1-b-0.cassandra-test-cluster-dc1-seeds.default.svc.cluster.local",
    "cluster" : "test-cluster",
    "dc" : "dc1",
    "rack" : "west1-b",
    "hostId" : "a9753ddd-76db-4c4d-81f8-8ab6ca93a3bb",
    "ipAddress" : "10.244.2.201"
  }, {
    "hostname" : "cassandra-test-cluster-dc1-west1-a-0.cassandra-test-cluster-dc1-seeds.default.svc.cluster.local",
    "cluster" : "test-cluster",
    "dc" : "dc1",
    "rack" : "west1-a",
    "hostId" : "3e32961c-fa7b-4522-9a88-6055bfba1a35",
    "ipAddress" : "10.244.1.25"
  }, {
    "hostname" : "cassandra-test-cluster-dc1-west1-c-0.cassandra-test-cluster-dc1-seeds.default.svc.cluster.local",
    "cluster" : "test-cluster",
    "dc" : "dc1",
    "rack" : "west1-c",
    "hostId" : "c759d0f6-82b4-407d-81bb-c0787f50b1c6",
    "ipAddress" : "10.244.2.202"
  } ]
}

So you know that "if a pod has this name, I fetch this topology file, parse it into some Java model representation and I pick hostId of that pod name".

You might have same cluster name and same datacenter name but these clusters would live in different k8s namespaces, so you need to make a difference by saving it as host ids.

The problem with the restoration of a node is that it is quite hard to do when a cluster is running, the most "clean" way would be to 1) decommission that node 2) delete that pod 3) start that pod again but in "restore" mode so it would restore and it would join a acluster. If that node is not reachable at all, one would probably have to remove / assasinate that node or something like that and new node would have to be started. This is quite advanced stuff and our operator does not know this natively or there would have to be some additional work done in this regard.

smiklosovic commented 4 years ago

@cin I ll try to do what you want in a standalone cluster not running in Kubernetes - to restore just one node, pretending that e.g. whole az is down. Sidecar binary "command-wise", it should be possible, but wrapping that into whatever environment you have will be up to you.

cin commented 4 years ago

Thanks for the clarification @smiklosovic around the host IDs. That makes complete sense given that names could conflict across namespaces. It seems the topology file could be used to resolve the hostname to host ID. I can think of a few ways this could work, but I need to investigate further how things can work with the current capabilities provided in the sidecar/backup. I have a few other things on my plate at the moment, but I'll be back on this tomorrow. Thanks again!

instaclustr / icarus

Backup taken with globalRequest enabled cannot be restored #3