rdiff-backup / rdiff-backup

Reverse differential backup tool, over a network or locally.
https://rdiff-backup.net/
GNU General Public License v2.0
1.09k stars 94 forks source link

Cannot make an incremental backup any more, fails with KeyError on index #211

Closed ericzolf closed 4 years ago

ericzolf commented 4 years ago

rdiff-backup cannot make an incremental backup any more. See the attached file: 2019-12-08-error.txt I'm backing up to an external disk, and sometimes the system is suspended during the rdiff-backup. Maybe this could be related.

Originally posted by @janvlug in https://github.com/rdiff-backup/rdiff-backup/issues/81#issuecomment-562973363

ericzolf commented 4 years ago

I prefer to have a separate issue for this one (even if I might close if it's a duplicate). Can you provide please:

ericzolf commented 4 years ago

@janvlug see previous comment, I didn't realise that you might not get notified of this new issue.

ericzolf commented 4 years ago

@janvlug without further inputs from you within one week or two, I'll close this bug.

ericzolf commented 4 years ago

Closing due to lack of reaction, feel free to re-open if you can reproduce the issue with the latest beta version.

sevens commented 1 year ago

I got the same issue, can reproduce with beta 2.1.3b3.

Output:

WARNING: this command line interface is deprecated and will disappear, start using the new one as described with '--new --help'.
WARNING: Previous backup seems to have failed, regressing destination now
NOTE: Regressing to date/time Sat Nov 12 19:21:05 2022
NOTE: Starting increment operation from source path /var to destination path /mnt/backup/ssd_mirror/var
WARNING: Expected path /mnt/backup/ssd_mirror/var/lib/libvirt/qemu/channel/target/domain-1-opensuse_drm to be a directory but found type None instead. This is probably caused by a bug in versions 1.0.0 and earlier.
Traceback (most recent call last):
  File "/root/.local/lib/python3.9/site-packages/rdiff_backup/backup.py", line 412, in get_mirror_rorp
    return self.cache_dict[index][1]
KeyError: (b'lib', b'libvirt', b'qemu', b'channel', b'target', b'domain-1-opensuse_drm')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.local/bin/rdiff-backup", line 33, in <module>
    sys.exit(load_entry_point('rdiff-backup==2.1.3b3', 'console_scripts', 'rdiff-backup')())
  File "/root/.local/lib/python3.9/site-packages/rdiffbackup/run.py", line 37, in main
    sys.exit(main_run(sys.argv[1:]))
  File "/root/.local/lib/python3.9/site-packages/rdiffbackup/run.py", line 105, in main_run
    ret_val |= conn_act.run()
  File "/root/.local/lib/python3.9/site-packages/rdiffbackup/actions/backup.py", line 154, in run
    backup.mirror_and_increment_compat200(
  File "/root/.local/lib/python3.9/site-packages/rdiff_backup/backup.py", line 53, in mirror_and_increment_compat200
    DestS.patch_and_increment(dest_rpath, source_diffiter, inc_rpath)
  File "/root/.local/lib/python3.9/site-packages/rdiff_backup/backup.py", line 210, in patch_and_increment
    ITR(diff.index, diff)
  File "/root/.local/lib/python3.9/site-packages/rdiff_backup/rorpiter.py", line 145, in __call__
    if last_branch.can_fast_process(*args):
  File "/root/.local/lib/python3.9/site-packages/rdiff_backup/backup.py", line 590, in can_fast_process
    mirror_rorp = self.CCPP.get_mirror_rorp(index)
  File "/root/.local/lib/python3.9/site-packages/rdiff_backup/backup.py", line 414, in get_mirror_rorp
    return self._get_parent_rorps(index)[1]
  File "/root/.local/lib/python3.9/site-packages/rdiff_backup/backup.py", line 561, in _get_parent_rorps
    raise KeyError(index)
KeyError: (b'lib', b'libvirt', b'qemu', b'channel', b'target', b'domain-1-opensuse_drm')

Python 3.9.15 (64 bit)

ericzolf commented 1 year ago

Without answers to my questions from a previous comment and some way to reproduce the issue, I still can't fix it.

sevens commented 1 year ago

Whoops, missed that, sorry; had a lot of unrelated issues getting the beta installed and running at all and was glad I could finally get (and post) the results in the first place.

OS: Slackware64-current Params: rdiff-backup --exclude=/var/tmp --exclude-other-filesystems /var/ /mnt/backup/ssd_mirror//var/ Filesystem source: ext4 Filesystem target: ext4 (over NFS) Output: attached

On reproducing: I have no idea what exactly triggered it. Hadn't made backups for about 1,5 years on this source/target combo but did a (few?) successful run(s?) before the error started to happen. Did had some interrupted runs before the error happened as well (some by just killing rdiff-backup/the wrapper script with a bfew Ctrl+Cs, some by the NFS mount disappearing for some times (some longer some very short, e.g. restarting the NFS server)).

Some further info: it's a backup of /var so it contains quite some sockets, locks and possibly other special files. No idea if its relevant but the dir (or lack thereof :P ) on which it fails contains just a single socket file.

Let me know if I can provide more info and/or answer questions.

issue_211_output.gz

ericzolf commented 1 year ago

I don't want to sound rude but if you kill rdiff-backup, and your NFS isn't stable, you can't expect miracles, things will break, and probably beyond repair. I'll have a look but don't expect too much.

sevens commented 1 year ago

Of course (and didn't sound rude to me). Though I'd hope a tool that keeps older versions of backups would at least be able to recover to a previous good backup if one exists, unless something absolutely catastrophical happens to the target dir (e.g. a significant part of relevant data/sectors being lost on the target disk device itself or worse). Especially as these can also happen outside of the user's control (power out or hw crash on either side, network going down, etc). Having the current and all previous backups unrecoverable would be quite anoying from a user's point of view :)

Or at least a way to manually recover would be nice (e.g. recreate missing dir+file at the target again or a forced restore to previous good backup); or if recovering the individual dir/file isn't possible at least being able to 'ignore'/'skip' it so new backups can be made again. E.g. in my case something like: rdiff-backup --check-destination-dir --ignore-broken-dir=lib/libvirt/qemu/channel/target/domain-1-opensuse_drm TARGET.

In my case I don't care about the broken dir (nor any of its history), nor especially about the entire current snapshot, but I do care about the /var backup itself (including most of its increments; at least not loosing all of them).

sevens commented 1 year ago

I have been able to get rid of the warnings, by creating the missing dir (with a socket file with the appropriate name in it just in case) on both target and source. Needed to create a few more missing ones as well (similar dir but different number, e.g. domain-15-opensuse_drm). Earlier attempts to just recreate the missing dir (both with and without the socket file in it) on just the target side were unsuccessful.

Removed the temporary missing dirs from source after this and did 2 more backup runs, both went fine as well.

FWIW: I made a copy of the entire bad /var backup target, I'll keep the copy around for at least quite some time, in case further investigation/testing is needed.

Weirdly enough, rdiff-backup -l TARGET didn't show the successful backup after first run, after a second run both showed up.

Final lines on first run: ``` increments.2022-11-08T11:11:58+01:00.dir Tue Nov 8 11:11:58 2022 increments.2022-11-12T13:00:44+01:00.dir Sat Nov 12 13:00:44 2022 Current mirror: Sat Nov 12 19:21:05 2022 ``` Doing *another* run directly after this made both that run *and* the previous run (that was just missing from the output before) appear: ``` increments.2022-11-12T13:00:44+01:00.dir Sat Nov 12 13:00:44 2022 increments.2022-11-12T19:21:05+01:00.dir Sat Nov 12 19:21:05 2022 increments.2022-11-21T20:15:48+01:00.dir Mon Nov 21 20:15:48 2022 Current mirror: Mon Nov 21 20:32:06 2022 ``` Output from first successful run: ``` WARNING: this command line interface is deprecated and will disappear, start using the new one as described with '--new --help'. WARNING: Previous backup seems to have failed, regressing destination now NOTE: Regressing to date/time Sat Nov 12 19:21:05 2022 NOTE: Starting increment operation from source path /var to destination path /mnt/backup/ssd_mirror/var WARNING: Action backup emitted warnings, see previous messages for details ``` and from the second: ``` WARNING: this command line interface is deprecated and will disappear, start using the new one as described with '--new --help'. NOTE: Starting increment operation from source path /var to destination path /mnt/backup/ssd_mirror/var ```
sevens commented 1 year ago

Also was wondering why it says its a WARNING while there's no successful backup made (nothing in rdiff-backup -l TARGET afterwards), and the program exit code is 1. I'd expect it to be an ERROR in such case (or in case of a WARNING for the program to continue after it).