[BUG] Restore with hardlinks fails in IMPORT phase

rjb1971 commented 2 years ago

Describe the bug I'm trying to restore a created snapshot which fails with error:

Error{source=cassandra-dev01-ird, message=Hardlinking phase finished with errors, the linking of downloaded SSTables to Cassandra directory has failed., throwable=com.instaclustr.esop.impl.restore.RestorationPhase$RestorationPhaseException: Unable to pass IMPORT phase.}

To Reproduce make backup curl --header "Content-Type: application/json" --data '{"type":"backup", "globalRequest":"true", "storageLocation" : "ceph://cassandra-icarus2-ird-backup-dev/cassandra_gms_dev/rc3/1", "metadataDirective":"REPLACE", "dataDirs":["/icarus/cassandra/data/data"], "skipRefreshing":"true"}' cassandra-dev00-ird:4567/operations

restore backup curl --header "Content-Type: application/json" --data '{"type":"restore", "globalRequest":"true", "dataDirs":["/icarus/cassandra/data/data"], "snapshotTag":"autosnap-1645605747", "restorationPhase":"INIT", "restorationStrategyType":"HARDLINKS", "storageLocation":"ceph://cassandra-icarus2-ird-backup-dev/cassandra_gms_dev/rc3/2", "import":{"type":"import", "sourceDir":"/icarus/tmp/"}, "resolveHostIdFromTopology":"true", "cassandraDirectory":"/icarus/cassandra/data/"}' cassandra-dev00-ird:4567/operations

Expected behavior restore should complete without errors.

Versions (please complete the following information):

OS "Ubuntu 20.04.3 LTS (Focal Fossa)"
cassandra_version: 3.11.6
openjdk version "1.8.0_322" OpenJDK Runtime Environment (Temurin)(build 1.8.0_322-b06) OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)
icarus version 2.0.2

Additional context command used: curl --header "Content-Type: application/json" --data '{"type":"restore", "globalRequest":"true", "dataDirs":["/icarus/cassandra/data/data"], "snapshotTag":"autosnap-1645605747", "restorationPhase":"INIT", "restorationStrategyType":"HARDLINKS", "storageLocation":"ceph://cassandra-icarus2-ird-backup-dev/cassandra_gms_dev/rc3/2", "import":{"type":"import", "sourceDir":"/icarus/tmp/"}, "resolveHostIdFromTopology":"true", "cassandraDirectory":"/icarus/cassandra/data/"}' cassandra-dev00-ird:4567/operations

Part of icarus logging [] - 08:53:42.480 [pool-4-thread-3] INFO c.i.e.i.r.RestorationPhase$HardlinkingPhase - Hardlinking phase has started. [] - 08:53:43.565 [pool-4-thread-3] INFO c.i.e.i.Manifest - Resolved manifest: cassandra_gms_dev/rc3/982ab74c-cd42-4885-a986-b61c3eb186b4/manifests/autosnap-1645605747-73919840-436c-3d25-a9d8-42659f4e5722-1645605756900.json [] - 08:54:31.818 [pool-4-thread-3] ERROR c.i.e.i.r.RestorationPhase$HardlinkingPhase - Unable to create a hardlink from /icarus/tmp/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db to /icarus/cassandra/data/data/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db, skipping the linking of all other resources and deleting already linked ones. java.nio.file.FileAlreadyExistsException: /icarus/cassandra/data/data/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db -> /icarus/tmp/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixFileSystemProvider.createLink(UnixFileSystemProvider.java:476) at java.nio.file.Files.createLink(Files.java:1086) at com.instaclustr.esop.impl.restore.RestorationPhase$HardlinkingPhase.execute(RestorationPhase.java:539) at com.instaclustr.esop.impl.restore.strategy.AbstractRestorationStrategy.restore(AbstractRestorationStrategy.java:93) at com.instaclustr.esop.impl.restore.coordination.BaseRestoreOperationCoordinator.coordinate(BaseRestoreOperationCoordinator.java:57) at com.instaclustr.icarus.coordination.IcarusRestoreOperationCoordinator.coordinate(IcarusRestoreOperationCoordinator.java:139) at com.instaclustr.esop.impl.restore.RestoreOperation.run0(RestoreOperation.java:148) at com.instaclustr.operations.Operation.run(Operation.java:247) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) [] - 08:54:31.840 [pool-4-thread-3] ERROR c.i.e.i.r.RestorationPhase$HardlinkingPhase - Hardlinking phase has failed: Hardlinking phase finished with errors, the linking of downloaded SSTables to Cassandra directory has failed.

Data on filesystem ls -al /icarus/cassandra/data/data/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db -rw-r--r-- 2 1001 root 43 Feb 22 14:56 /icarus/cassandra/data/data/gms/labels-bf41945013b011ebb610898044132165/md-146-big-CompressionInfo.db

smiklosovic commented 2 years ago

Wow. Are you starting from scratch with the latest stuff? I have junit tests testing this and it just passes fine.

smiklosovic commented 2 years ago

What did the previous stage doing? There should be TRUNCATE phase before hardlinking which should remove all stuff from Cassandra table so linking will link it. The error you got says that the file is still there (FileAlreadyExistsException).

Truncating happens only on the coordinator as it will call truncate via jmx and Cassandra will internally do the rest with other nodes. Truncate operation in Cassandra is in this sense "distributed" to all nodes.

Try to be also explicit what entities you want to backup and restore via 'entities' field.

I bet you hit this (1). Your 'entities' were empty so it was thinking you do not want to "truncate" anything, so it did not, and then on hardlinking these filese were still there. As I said, try to be explicit what keyspaces / tables you want to restore.

If you do not set entities on backup, it will backup all your database, but you need to be explicit on restoration what you want to restore.

(1) https://github.com/instaclustr/esop/blob/master/src/main/java/com/instaclustr/esop/impl/restore/RestorationPhase.java#L361

rjb1971 commented 2 years ago

Yes, it could be that those tables are empty, because the restore did fail before (see [BUG] icarus restore fails when dropped tables still have cassandra snapshots or backups #9) Which tructated all the tables and stopped.

I tried to restore the backup from before this failure and got this error. After this I tried a backup with the old version, but somehow i get checksum errors on those backups (still not sure why and how that happened). But when i make a backup (of course with some empty tables) and restore it using version 1.1.0, it does works But i don't know how the tooling handle empty tables. It make sense it works if the tool skips those empty tables.

It looks like you say that it is not possible to restore tables that are currently empty, but that sounds strange to me. I isn't that is one of the reasons, you have to do a restore?

About setting entities: I need to restore the complete database to see if all the tables are correctly backed-up and could be restored. And when there are problems, which needs a restore, you want to set the database back to a consistent state, which means in most cases the complete database. Or do i miss something?

smiklosovic commented 2 years ago

If you want to restore all database on running cluster, you can realistically restore only user keyspaces / tables, not system ones, you just can not change system tables while your node is running. So if you want to restore all user keyspaces, the best shot is to enumerate them all like "entities": "ks1,ks2,ks3", this will automatically restore all tables a particular keyspace consists of.

How many keyspaces your database consists of? Is it really too problematic to enumerate all of them upon restoration?

rjb1971 commented 2 years ago

At the moment it isn't much (4 or 5), but i'm was afraid to forget one, when things changes. Not a big deal and I wil change it, to prevent failures on system tables.

So how do we solve the restore problem, because it fails on user keyspace table.

smiklosovic commented 2 years ago

You can either stop the node, remove sstables, start the node and try to restore again so linking will be successful as the files will not be there anymore, or you can restart that restoration process again and see if TRUNCATE phase will remove them?

rjb1971 commented 2 years ago

You can either stop the node, remove sstables, start the node and try to restore again so linking will be successful as the files will not be there anymore

I don't like this idea, because it is a test to see if we can restore when there is a problem. Stopping and deleting manually feels like i don't need iCarus at all.

you can restart that restoration process again and see if TRUNCATE phase will remove them?

I did rerun this restore (without entities) multiple times and it didn't remove the files before. I can't see a reason why it will work when i supply the entities in the restore request.

For me the big question is: In case there is an empty table (for example a delete cleared all entries by mistake) and i want to restore the content. Will the restore fails, due to the empty table? If it does isn't that a BUG? It doesn't feel right to manually try to restore the state, because our cassandra state is reached, with normal database commands (as far i can see). Therefore (in my opinion) the restore should be able to deal with this situation.

smiklosovic commented 2 years ago

I can't see a reason why it will work when i supply the entities in the restore request.

As I said previously, because you not specifying any entities will likely evaluate it like there is nothing to truncate (1). So it will not truncate anything, hence all your sstables will be still there. So when you are going to restore and it wants to link the file from your import dir (where they were downloaded) to the actual destination in live table dir in Cassandra, it will not be successful because that target file for link is already there - because truncating process has not removed it - because your entities were empty.

You can get all your keyspaces like this:

$ echo 'select keyspace_name from system_schema.keyspaces;' | cqlsh | grep -v "system"

Answer to your question:

So, if you delete all the table, it depends how you did it. Either you used TRUNCATE command in CQL manually, in that case it might create auto snapshot (which is true by default) or not. Or you literally delete it row by row, in that case it will just create new sstables containing tombstones so (after flush) there will be more sstables on disk with them until you compact them.

Whatever the case is, it is important that Icarus issues TRUNCATE on these keyspaces / tables, so it will physically either remove it or it will move it to snapshot. TRUNCATE is crucial because it will empty your table dir so you are able to link it from what you downloaded.

(1) https://github.com/instaclustr/esop/blob/master/src/main/java/com/instaclustr/esop/impl/restore/RestorationPhase.java#L361

rjb1971 commented 2 years ago

Oh, Sorry now i understand. I have the feeling i get in quicksand at the moment. I tried it, but somehow we got the dropped view problem again, but in the reshresh state??? So i removed the view related directories again and tried the restore again. And now i get the hardlink problem with an index??? /icarus/cassandra/data/data/gms/incident_users-b4b2551013b011eba5f4b14d6a129718/.incident_users_user_id_idx/md-589-big-Data.db -> /icarus/restore/gms/incident_users-b4b2551013b011eba5f4b14d6a129718/.incident_users_user_id_idx/md-589-big-Data.db

So i'm not sure how to get to a stable situation :-(

I removed the files and started the restore, but still fails. Looks like the files are created in the steps before the hardlink

smiklosovic commented 2 years ago

You can execute only one phase, not all of them. You can only execute TRUNCATE phase in "restorationPhase" field. Normally, if you are restoring and your phase is INIT and your globalRequest is true, it will bubble up through all phases automatically, INIT, DOWNLOAD, TRUNCATE, IMPORT, CLEANUP.

But you can make it more granular, so you can execute only one phase, for example, you can execute just TRUNCATE and nothing more. So you can debug your problems in more detailed and granular level.

You achieve it by setting "singlePhase": "true" and "restorationPhase": "TRUNCATE" and "globalRequest": "true"

https://github.com/instaclustr/esop/blob/master/src/main/java/com/instaclustr/esop/impl/restore/RestoreOperationRequest.java#L164-L168

rjb1971 commented 2 years ago

Did run the separately. and removed all the files which prevented the IMPORT stage to complete, (several index files) And finally I have completed all the phases. I will try to backup and fully restore the database tomorrow again. There is one thing i can't explain. The schema of one table has changed between the backup and execution of the restore. But it looks like the schema wasn't restored. I'm not sure if i forgot something or it is because of the many restore failures.

smiklosovic commented 2 years ago

What you mean about schema not being "restored" specifically?

smiklosovic commented 2 years ago

if you have a schema of version X, you take a backup, then you alter it to version X+1 and you restore, it will restore SSTables from times of version X to schema of version X+1. Yes. This is true.

read this please: https://github.com/instaclustr/esop#restoring-into-different-schemas

rjb1971 commented 2 years ago

Oh i missed that too. We added some columns to a table and expected them to be removed too during the restore.

Still need to do a new backup and restore round. Will keep you posted

smiklosovic commented 2 years ago

That is in general too complicated to do. We would have to create keyspace too if you managed to drop that as well with table, we need to know the topology upon keyspace creation so restoring fits the nodes and so on ... lot of problems.

rjb1971 commented 2 years ago

did a backup and restore: and this worked as expected. Then i added a record to a table and run the restore again and run into the same problem again

[] - 10:14:52.744 [pool-4-thread-3] ERROR c.i.e.i.r.RestorationPhase$HardlinkingPhase - Unable to create a hardlink from /icarus/restore/gms/incident_users-b4b2551013b011eba5f4b14d6a129718/.incident_users_user_id_idx/md-23-big-CompressionInfo.db to /icarus/cassandra/data/data/gms/incident_users-b4b2551013b011eba5f4b14d6a129718/.incident_users_user_id_idx/md-23-big-CompressionInfo.db, skipping the linking of all other resources and deleting already linked ones. java.nio.file.FileAlreadyExistsException: /icarus/cassandra/data/data/gms/incident_users-b4b2551013b011eba5f4b14d6a129718/.incident_users_user_id_idx/md-23-big-CompressionInfo.db -> /icarus/restore/gms/incident_users-b4b2551013b011eba5f4b14d6a129718/.incident_users_user_id_idx/md-23-big-CompressionInfo.db at sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)

smiklosovic commented 2 years ago

yeah, indexes again?

rjb1971 commented 2 years ago

Yep the index, when i manually remove them and rerun the IMPORT phase it works, But isn't wanted behavior

smiklosovic commented 2 years ago

Yeah I ll take a loot next week. I am sorry you are hitting this issues. I have to admit it is little bit rough around the edges but we are trying.

rjb1971 commented 2 years ago

I know, we also looked to see if we can make it simpler, without icarus, but i'm afraid we just rebuild icarus and hit all the same problems we do now.

rjb1971 commented 2 years ago

I don't think, we did something special, but in case you need this:

CREATE TABLE IF NOT EXISTS gms.incident_users (
   incident_id text,
   user_id text,
   full_name text,
   created timestamp,
   organisational_unit text,
   PRIMARY KEY (incident_id, user_id)
);

CREATE INDEX IF NOT EXISTS ON gms.incident_users (user_id);

smiklosovic commented 2 years ago

So backup works but restore doesnt? When indexes are involved. Can you check you have indexes backed up too?

rjb1971 commented 2 years ago

I checked in our S3 repository and there are several directories (41) in

....gms/incident_users-b4b2551013b011eba5f4b14d6a129718/.incident_users_user_id_idx

with db files.

And are also mentioned in the manifest file.

So i would say the indexes are backed-up

rjb1971 commented 2 years ago

any progress?

smiklosovic commented 2 years ago

Wow time flies ... i should have some time in couple of days to look at this.

rjb1971 commented 2 years ago

Still no progress? Can I help with something?

smiklosovic commented 2 years ago

hey, there was actually some movement in Esop, check the lastest master of Esop. It fixes a case when there is a dropped empty snapshot. You can build it and then build Icarus (mvn clean install -DskipTests) for both. Appart from that I have not progressed with indexes sorry ...

smiklosovic commented 2 years ago

looking into indexes issue now

smiklosovic commented 2 years ago

@rjb1971 please try this one, I hit your issue with indexes and I fixed that (at least it does not occur anymore for me :) )

Let me know how it goes.

https://oss.sonatype.org/content/repositories/snapshots/com/instaclustr/icarus/2.0.3-SNAPSHOT/icarus-2.0.3-20220412.202823-5.jar

rjb1971 commented 2 years ago

This is good news, I will try to look at it tomorrow.

Op di 12 apr. 2022 om 22:31 schreef Štefan Miklošovič < @.***>:

@rjb1971 https://github.com/rjb1971 please try this one, I hit your issue with indexes and I fixed that (at least it does not occur anymore for me :) )

Let me know how it goes.

https://oss.sonatype.org/content/repositories/snapshots/com/instaclustr/icarus/2.0.3-SNAPSHOT/icarus-2.0.3-20220412.202823-5.jar

— Reply to this email directly, view it on GitHub https://github.com/instaclustr/icarus/issues/10#issuecomment-1097187472, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB55DW2J67URCNQ7YIXWGSTVEXMRHANCNFSM5QYA6F4Q . You are receiving this because you were mentioned.Message ID: @.***>

rjb1971 commented 2 years ago

I tested the new version, but sadly the restore failed again. Looks like a new problem: "Failed tables to refresh: {gms.incidents_by_label=Unknown keyspace/cf pair (gms.incidents_by_label), groups.groups_user_count=Unknown keyspace/cf pair (groups.groups_user_count), gms.meldkamers=Unknown keyspace/cf pair (gms.meldkamers)}

These are tables which are dropped a long time ago and have dropped snapshots. I assume the refresh shouldn't be performed on those tables

ls -l meldkamers-*/snapshots meldkamers-1477b1209afb11ec86a5b14d6a129718/snapshots: total 0 drwxr-xr-x 2 1001 root 265 Mar 3 15:35 dropped-1646321710153-meldkamers

meldkamers-1fc150c0a01d11eaa1af898044132165/snapshots: total 4 drwxrwxrwx 2 1001 root 4096 Jun 3 2020 dropped-1591172741422-meldkamers

meldkamers-b272664013a711eba5f4b14d6a129718/snapshots: total 0 drwxr-xr-x 2 1001 root 265 Oct 21 2020 dropped-1603293322975-meldkamers

meldkamers-b5e026b013b011eba5f4b14d6a129718/snapshots: total 0 drwxr-xr-x 2 1001 root 265 Oct 21 2020 dropped-1603293546384-meldkamers

meldkamers-d88579e0a52211ecb55b898044132165/snapshots: total 0 drwxr-xr-x 2 1001 root 265 Mar 16 12:19 dropped-1647433194987-meldkamers

smiklosovic commented 2 years ago

How is that possible it is refreshing that? What is your restoration request body? Why you just can not go and remove these tables?

rjb1971 commented 2 years ago

Yes i was also a bit of a surprise to me too. In the previous versions i didn't have this problem. When i deleted the indexes by hand, i didn't have this problem. Yes i can delete those directories, but it feels not right. The tool should be able to coop with this. And it did before.

I used: curl --header "Content-Type: application/json" --data '{"type":"restore", "globalRequest":true, "dataDirs":["/icarus/cassandra/data/data"], "snapshotTag":"autosnap-1649920483", "restorationPhase":"INIT", "restorationStrategyType":"HARDLINKS", "storageLocation":"ceph://cassandra-icarus2-ird-backup-dev/cassandra_gms_dev/rc3/2", "import":{"type":"import", "sourceDir":"/icarus/restore/"}, "resolveHostIdFromTopology":"true", "cassandraDirectory":"/icarus/cassandra/data/", "entities":"archive,gms,groups,notifications"}' cassandra-dev00-ird:4567/operations

smiklosovic commented 2 years ago

@rjb1971 try this https://oss.sonatype.org/content/repositories/snapshots/com/instaclustr/icarus/2.0.3-SNAPSHOT/icarus-2.0.3-20220414.094209-6.jar

I dont guarantee it will fix it but I think so. I just put together something quick which does make sense to me.

I think it is happening because we stopped to care about dropped tables, it will not fail as it used to but we got this. It is complicated. The reason this is all happening is that I can not recognize if a table is still active or not just by looking at the directory structure when I see that it has dropped- snapshot - that is not enough because you can have that table in fact active again, under same table id, when you recreate that table again sometime in the future after you dropped it. So then you will have a table which is active from Cassandra point of view but it contains dropped- snapshot I used to care about that when I was about to evaluate if that table is active or not.

The more appropriate way to do this would be to ask via JMX or CQL what tables Cassandra actually has, so I would not parse it on the disk, but then in-place restoration would be kind of problematic because we do not have a running node to ask (in-place is different from hard-linking in such fashion that your node needs to be down to restore, it is used only in Esop, it is not invokable from Icarus).

rjb1971 commented 2 years ago

OK will try this. I keep your posted

rjb1971 commented 2 years ago

Tested it and it complete successfully now, but i still see the errors in the log. Not sure if that happened in previous versions too.

smiklosovic commented 2 years ago

I just ignored it and I do not throw it anymore but it might be still logged ... that is fine. It is harmless to refresh a table which does not exist.

rjb1971 commented 2 years ago

I agree.. ThanX for your effort to fix our problems and i think your product is improved by doing so. It is really appreciated. All I need now is a new release. :-) And again thank you.

smiklosovic commented 2 years ago

Yes, next week I ll release it. I ll ping you.

rjb1971 commented 2 years ago

OK thX

rjb1971 commented 2 years ago

reminder: what is the ETA of the new release?

smiklosovic commented 2 years ago

hey, I am a little bit busy atm, I should do it until the end of the week.

smiklosovic commented 2 years ago

I am on it, we have freeze in Cassandra project from 1st May so all my attention went there.

smiklosovic commented 2 years ago

@rjb1971 it is released as 2.0.3. I am closing this.

rjb1971 commented 2 years ago

It is ok I can start with the new release not sooner than next week

Op ma 2 mei 2022 15:49 schreef Štefan Miklošovič @.***>:

I am on it, we have freeze in Cassandra project from 1st May so all my attention went there.

— Reply to this email directly, view it on GitHub https://github.com/instaclustr/icarus/issues/10#issuecomment-1114905970, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB55DW4PBWRXKOSBUGV4LOLVH7MPVANCNFSM5QYA6F4Q . You are receiving this because you were mentioned.Message ID: @.***>

instaclustr / icarus

[BUG] Restore with hardlinks fails in IMPORT phase #10