restic / restic

Fast, secure, efficient backup program
https://restic.net
BSD 2-Clause "Simplified" License
25.68k stars 1.53k forks source link

Integrity check failed: Data seems to be missing #3908

Closed johndoe31415 closed 1 year ago

johndoe31415 commented 2 years ago

Output of restic version

restic 0.14.0 compiled with go1.19 on linux/amd64

How did you run restic exactly?

$ time restic prune --repack-small -r restic
enter password for repository:
repository 5dc49c39 opened (repository version 1) successfully, password is correct
loading indexes...
loading all snapshots...
finding data that is still in use for 590 snapshots
[9:14] 100.00%  590 / 590 snapshots
searching used packs...
{<data/010e7a8d> <data/02d05256> <data/07b7801c> <data/095b1f9a> <data/0eeb743d> <data/1d9ba0fe> <data/20cd7da5> <data/248c5a30> <data/2611ce54> <data/29c04ffd> <data/2a4fe36f> <data/2ce7c286> <data/310b799d> <data/314bddb9> <data/3178900f> <data/3582b713> <data/367d74bf> <data/36b6519b> <data/377028ba> <data/38e45ee1> <data/3a5c9227> <data/3aad720c> <data/3b4543f0> <data/3e2f5338> <data/3e5e8ed9> <data/3ee21cc1> <data/405de58d> <data/430b274a> <data/430e114e> <data/44202514> <data/464bb775> <data/47c4e645> <data/48b00f7a> <data/48ebb6e6> <data/4937458a> <data/4a68142d> <data/4af584ff> <data/4f8f7a17> <data/4fee89b0> <data/521bbda0> <data/5a5970c2> <data/5d0f6372> <data/6cecee6b> <data/6d7c87e0> <data/6d8c51b0> <data/6f757bd7> <data/6fd300a4> <data/7227900e> <data/731e3123> <data/742b5681> <data/7433576e> <data/74e4487c> <data/761ab7d6> <data/78a4338f> <data/79cd8a35> <data/7abca4b4> <data/82e9bf88> <data/82f45796> <data/851e5f99> <data/86b5624b> <data/89ec9473> <data/8add445a> <data/8f9f0fc4> <data/93a46240> <data/9442d7f8> <data/944a29df> <data/944bacc6> <data/976ae5c2> <data/977aac8b> <data/99a60a35> <data/9ef48f3f> <data/a085ad1a> <data/a0d44434> <data/a267fc37> <data/a696241a> <data/aba33157> <data/acca15a3> <data/b27b90b0> <data/b914da51> <data/bbd903c6> <data/bf1565b0> <data/c16a95da> <data/c1bddf0a> <data/c2ab6979> <data/c39074e4> <data/c4bd55cc> <data/c63e1996> <data/c86b412b> <data/cb0be6af> <data/cb35263b> <data/cea8c925> <data/ceb080be> <data/cf09ffc3> <data/d1105822> <data/d2cbf730> <data/d3fbde99> <data/d84eb8dc> <data/da08e778> <data/da13588e> <data/db49afa2> <data/db73182d> <data/dd86d3e1> <data/ded3893e> <data/e0d9a443> <data/e1b79a12> <data/e383df15> <data/e414a8a7> <data/e62d7265> <data/e976432d> <data/eb589ec1> <data/ec09f1c9> <data/f0323621> <data/f0670376> <data/f0bf9850> <data/f33268b1> <data/f6aad37e> <data/f939c7f3> <data/fb05d16b> <data/fc0366fd> <data/fed7b594>} not found in the index

Integrity check failed: Data seems to be missing.
Will not start prune to prevent (additional) data loss!
Please report this error (along with the output of the 'prune' run) at
https://github.com/restic/restic/issues/new/choose
Fatal: index is not complete

real    13m36,947s
user    26m35,532s
sys 0m50,592s

What backend/server/service did you use to store the repository?

Local file storage.

Expected behavior

I expect restic to prune my data.

Actual behavior

It doesn't prune, but crashes and tells me to file a bug -- which is this one.

Steps to reproduce the behavior

It reproducibly crashes when run as shown as above; the system in question only had 8 GB of RAM and so it previously crashed because of an OOM error. I've added 16 GB of swap and now the above error is the result.

Do you have any idea what may have caused this?

This is the backup of my main machine (~4 TB of storage). I've never pruned it before because space is plenty on my target. The oldest roughly 4 years old and occupies in total 6.7 TB in about 600 snapshots and 1.4 million files:

6c9ac213  2018-07-26 23:41:55  reliant                 /

Recently, I wanted to start pruning the repository and maybe get rid of some stuff. Prune failed with a fatal error because some of the pack IDs were wrong. So I followed the advice I read in a ticket here and first did

$ time restic check --read-data -r restic/

Which found some corrupt packs among those:

Pack ID does not match, want 30d136f7d93a5a11b5db55c37ff2246de89db59ca18f11d4ec6c3a951a6afc43, got 22f449c1fc5469959d0f022d2555fcdd9c0a8766f4d41033271ae10fb5689e48
Pack ID does not match, want c1906d2c605dcd86f1dfe3af21378e3bc79b56236ba30ad5d33c02d7f7be1996, got 5c64b60a61fef33186d543009b4b745e3a44da5543d2f6bc8717ed5fb4cad1ad
Pack ID does not match, want d17ec7e883c15b517ecdf9c102b6dded07bac3651d7c0a298b5c99ae299a8ddb, got f064a431952e8e52a017ce47d6e902aa251dcea8c9fcd4185a86c29c9529b8ec
Pack ID does not match, want 01b71b367d6af7c5683726c00763286a4823a4585bf8c28f8438eba5dfb938b8, got 3d01b77bfdaef45255c46d07c8af635644f91e664a698253d93316722fd1e3a7
[...]

I followed the advice and moved those packs out of their directories into a "corrupt" directory outside of the restic storage. Then, I have made a full copy of my index directory, and finally ran "restic rebuild-index".

Afterwards I again tried the pruning, but as you can see, it did not lead anywhere.

Note that I also checked the duplicate #3365 but I believe this to be a different root cause (they had some remote storage backend, I have just plain old local files).

Do you have an idea how to solve the issue?

No, but I can restore the index pre "rebuild-index" and I can also move the corrupt packs back in their old position if that would be helpful for debugging.

restic prune then also deterministically crashes.

Also note that the blobs that restic complains are missing are not the same that I removed (because they were corrupted). The corrupt ones are:

total 88M
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 01b71b367d6af7c5683726c00763286a4823a4585bf8c28f8438eba5dfb938b8
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 0d160f3ec0f37cd0a56b5d3a484c241627bc2f0fda9de602f75a72f01f75242e
-rw------- 1 joe joe 4,2M   24.02.2019 23:12:21 1ca1871b8042e7ee47f04f759cc2ef523abb4cc27f63134c8528a3caf4603694
-rw------- 1 joe joe 4,2M   24.02.2019 23:12:21 210a419b20e20159195f47aad5e23eda4d3c77690f88e963a430424d8bc3ccb3
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 214010300ff7ca38c9274dcf4a622729695e42bf1217489f854018f79aa4e269
-rw------- 1 joe joe 4,6M   24.02.2019 23:12:21 30d136f7d93a5a11b5db55c37ff2246de89db59ca18f11d4ec6c3a951a6afc43
-rw------- 1 joe joe 4,3M   24.02.2019 23:12:21 4edcecd83eb7c13f18d87e3619753d2f54d75414783b903fbe79f6f26dbb8535
-rw------- 1 joe joe 4,4M   24.02.2019 23:12:21 563c7ff32e857cf70b707adfb5e5afea239be56c8db0e4e20230342873e54f17
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 651b1f292e8ade87cb79a0eedd9ab6e1cdbef5affdb392014e4013e409e1d029
-rw------- 1 joe joe 4,2M   24.02.2019 23:12:21 73d47ff16e6bb10c31408bf9e68e59286f5efca57e07b399254066ef7b9225db
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 936e2772edcb47fd400649c561730f0b627a307294db17fc7b9eef8275f67947
-rw------- 1 joe joe 4,2M   24.02.2019 23:12:21 96975aa303a940f483142a4a2668ed6223cc8694fff5305bfe3d74ee82684089
-rw------- 1 joe joe 4,2M   24.02.2019 23:12:21 a2f9a70ce3ce4a6f8fc02bb00022b325842aff667455b4efeca14ffbd41316b3
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 c0b0be8003850c81da2a3b5ae69d283dd63e4967144570a5b10737f0d7f33440
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 c1906d2c605dcd86f1dfe3af21378e3bc79b56236ba30ad5d33c02d7f7be1996
-rw------- 1 joe joe 4,4M   24.02.2019 23:12:21 c78a0120b21261fcd690bd80e021e136b44e25997bdb1817c1073f025dea4b41
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 cad03277956e785de5b1ba70f6f04a7dd6c1b0344d54d3de99d1acedd689975e
-rw------- 1 joe joe 4,4M   24.02.2019 23:12:21 d17ec7e883c15b517ecdf9c102b6dded07bac3651d7c0a298b5c99ae299a8ddb
-rw------- 1 joe joe 4,3M   24.02.2019 23:12:21 d82c611475f7db42f84dcc2d594bfd02305a4edb1febad455e76f5626a7fa128
-rw------- 1 joe joe 4,1M   24.02.2019 23:12:21 f7dba458696b75710981adbc1e74ab853a89f94078bff05da004c843c6dedd07
-rw------- 1 joe joe 4,3M   24.02.2019 23:12:21 f9707a1e936413f6947c75e43d0428ee30d2a41571958d068ac1774e118d047a

Did restic help you today? Did it make you happy in any way?

Restic continues to be awesome, no doubt about it.

johndoe31415 commented 2 years ago

Also, because the memory issues have been there for a while (I originally filed #1723 before the prune logic was rewritten) I've now performed some memory profiling. Not sure if this is useful, but I've captured the memory of the restic process every 10 secs and turned it into a plot. In my case, prune takes up about 9 GB of RAM before it dies (X axis is time in seconds, Y axis is memory in GB):

Bildschirmfoto zu 2022-08-30 16-20-31

MichaelEischer commented 2 years ago

The damaged pack files appear to be three years old by now, so there's probably not much point in investigating how they were damaged (there have been lots of changes all over the restic codebase). Note the difference between pack files and data blobs. Each pack file consists of several blobs. That is the reason why the pack filenames are totally different from the missing blob ids.

To repair the repository you'll have to follow the steps from https://github.com/restic/restic/issues/828#issuecomment-706186047 . For Route 1 you'll probably have to restore the old index and move the pack files back. If you need help with these steps, just ask.

Regarding the memory usage: the plot does not look totally unexpected, once #3899 is merged, it should top out at about 7GB. You could set the environment variable GOGC=50 to avoid swapping. This tell go to garbage collect more often.

johndoe31415 commented 2 years ago

Interesting observation, all corrupt packfiles have the same timestamp. I'm guessing restic touches the files to have them have the same mtime as the snapshot, yes? I fully agree, how they got damaged -- water under the bridge.

But I gotta admit, the repair procedure -- of which I tried following route 2 (missing the difference between pack files and data files) is excruciating painful. I do consider myself a fairly advanced user but it's certainly out of my comfort zone, if not out of my depth.

If these are really old snapshots then I don't care and would just want to delete them, but I need to know that the recent backups are in no way affected (e.g., that there was not some deduplication and the recent backups think that it's fine because some files were backed up in the broken packs).

Right now it's late and I'm tired and I don't trust myself enough to do this correctly. The restic find command already runs for a few minutes (10?) with no results so far. And I have to do this 21 times for each pack? Oh and I just noticed I need to perform a debug build of restic. Which means I need to install go first and figure out how to build it again (I did it once before but Go is fairly alien to me to be honest).

This is all making me very unhappy. Not being sure if my backup is okay is definitely making me uneasy but seeing that the repair PR is unmerged 2 years later means it's probably not getting done anytime soon.

The pragmatic solution would be to remove my history and start fresh with a new repository. Which kindof defies the whole purpose of snapshotted incremental backups. No, this is really making me very unhappy :-( I'll see what I do tomorrow.

EDIT: Don't know if that is a workable solution, it'd take a week:

repository a6b293b6 opened (repository version 2) successfully, password is correct
created new cache in /root/.cache/restic
no parent snapshot found, will read all files
[42:08] 0.49%  242724 files 16.058 GiB, total 3603516 files 3.188 TiB, 0 errors ETA 142:02:17
MichaelEischer commented 2 years ago

The pragmatic solution would be to remove my history and start fresh with a new repository. Which kindof defies the whole purpose of snapshotted incremental backups. No, this is really making me very unhappy :-( I'll see what I do tomorrow.

Running rebuild-index, then removing all snapshots and running backup afterwards would avoid reuploading everything, but should get the repository back into a useable shape. Although, it won't be possible to be completely sure unless restic check --read-data completes successfully.

But I gotta admit, the repair procedure -- of which I tried following route 2 (missing the difference between pack files and data files) is excruciating painful. I do consider myself a fairly advanced user but it's certainly out of my comfort zone, if not out of my depth.

My plan is to eventually let check report which snapshots are damaged which would make "route 2" much simpler. The repair PR will eventually be merged in some form (the command name will probably change), once I get around to review it.

Starting from restic 0.13.0 you can also pass multiple pack ids to find.

If these are really old snapshots then I don't care and would just want to delete them, but I need to know that the recent backups are in no way affected (e.g., that there was not some deduplication and the recent backups think that it's fine because some files were backed up in the broken packs).

After running rebuild-index the new backups will be ok, assuming that all damaged pack files have been removed before.

Which means I need to install go first and figure out how to build it again (I did it once before but Go is fairly alien to me to be honest).

Just get a recent go compiler version (at least >=1.15), checkout the master branch and run go build -tags debug ./cmd/restic. Then you'll have a new restic binary in the current folder.

MichaelEischer commented 1 year ago

Closing as there's nothing left to do here.