kimono-koans / httm

Interactive, file-level Time Machine-like tool for ZFS/btrfs/nilfs2 (and even Time Machine and Restic backups!)
https://crates.io/crates/httm
Mozilla Public License 2.0
1.36k stars 29 forks source link

Snapshot metadata corruption - check for deleted files #32

Closed TheDrifter363 closed 2 years ago

TheDrifter363 commented 2 years ago

Hi,

So I ran this command: "httm -d -n -R --no-live ~ > deleted-files.txt" and it caused high cpu usage, which locked out my server. I couldn't ssh or anything. So I did an unsafe poweroff, and then when I rebooted back up, I saw snapshot metadata corruption. I'm not sure what's going on. I'm nervous about repeating that command, so I don't know if it'll happen again. Just thought I should mention it.

kimono-koans commented 2 years ago

Appreciate you leaving a comment.

Can you tell me a little more about how this happened? httm -d -n -R --no-live ~ > deleted-files.txt should print dots to stderr to show progress. Did you see those dots at all? Have you tried the command without the redirect?

A full system hang for which you cannot ctrl + c or ssh into the box sounds like maybe a thread deadlock issue to me? What version are you running (httm -V)?

If you do try again, would you try with the latest bits, or the latest tagged release? What about trying on a different pool, if you have one? You could also try limiting the concurrency by limiting execution to one core (taskset -c 0 httm ...).

Is ls -al -R ~ as fast as it should be (very fast)? ls -al -R is very similar to what the httm code is doing. Is it possible statx calls are taking a longer time than they should on your system?

kimono-koans commented 2 years ago

I'm going to close, but would be pleased to reopen later, if you have more information to add.