kimono-koans / httm

Interactive, file-level Time Machine-like tool for ZFS/btrfs/nilfs2 (and even Time Machine and Restic backups!)
https://crates.io/crates/httm
Mozilla Public License 2.0
1.38k stars 29 forks source link

httm fails to run with 'Error: No such file or directory (os error 2)' #106

Closed cbreak-black closed 8 months ago

cbreak-black commented 8 months ago

The latest version of httm (installed via httm_0.37.0-1_amd64.deb) fails with the error Error: No such file or directory (os error 2), while the previous version 0.36.5 (installed via httm_0.36.5-1_amd64.deb) seems to work. This applies to invoking httm without parameters, or with httm -b for browsing, but httm -h works either way.

# With 0.37.0
cbreak@twilight:~$ httm .bashrc       
Error: No such file or directory (os error 2)

# With 0.36.5
cbreak@twilight:~$ httm .bashrc       
───────────────────────────────────────────────────────────────────────────────────────────────────────────
Sat Jan 02 14:21:02 2021  3.7 KiB  "/home/cbreak/.zfs/snapshot/autosnap_2024-03-11_15:00:20_hourly/.bashrc"
───────────────────────────────────────────────────────────────────────────────────────────────────────────
Sat Jan 02 14:21:02 2021  3.7 KiB  "/home/cbreak/.bashrc"
───────────────────────────────────────────────────────────────────────────────────────────────────────────

I've attached an strace from such a failure. The only interesting part seems to be the following:

openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
sched_getaffinity(0, 128, [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]) = 8
rt_sigaction(SIGRT_1, {sa_handler=0x7153f74bc8a0, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x7153f746a990}, NULL, 8) = 0

I don't know if this is related to the error, since it doesn't seem like a reason to fail, and in fact the working version also performs these operations, with the same outcome, but does NOT fail. Maybe some other part of the code reads the stale errno value.

This is reproducible on my system (Ubuntu 23.10, zfs-2.2.0~rc3-0ubuntu4).

kimono-koans commented 8 months ago

This is reproducible on my system (Ubuntu 23.10, zfs-2.2.0~rc3-0ubuntu4).

Would you mind building from source on your system? Instructions are in the README. My guess it's related to some mismatch re: libc re: the build environment and 23.10.

cbreak-black commented 8 months ago

Building on my system results in binaries that fail in the same way as the ones in the .deb package. (I've used rust 1.75 from ubuntu's package repo to build, building succeeded without obvious error)

cbreak-black commented 8 months ago

git bisect found commit edecedb365b21c68041ce134dcd4de1a92f5aa14 as being the first failing one. In particular, the problem goes away with this change (I don't speak rust, but it could possibly be because the Httm result that is collected contains that error):

diff --git a/src/parse/snaps.rs b/src/parse/snaps.rs
index da116c80..b9e088df 100644
--- a/src/parse/snaps.rs
+++ b/src/parse/snaps.rs
@@ -59,7 +59,7 @@ impl MapOfSnaps {
     pub fn new(map_of_datasets: &HashMap<PathBuf, DatasetMetadata>) -> HttmResult<Self> {
         let map_of_snaps: HashMap<PathBuf, Vec<PathBuf>> = map_of_datasets
             .par_iter()
-            .map(|(mount, dataset_info)| {
+            .flat_map(|(mount, dataset_info)| {
                 let snap_mounts: HttmResult<Vec<PathBuf>> = match dataset_info.fs_type {
                     FilesystemType::Zfs | FilesystemType::Nilfs2 | FilesystemType::Apfs => {
                         Self::from_defined_mounts(mount, dataset_info)
@@ -72,7 +72,7 @@ impl MapOfSnaps {

                 snap_mounts.map(|snap_mounts| (mount.clone(), snap_mounts))
             })
-            .collect::<HttmResult<_>>()?;
+            .collect();

         if map_of_snaps.is_empty() {
             Err(HttmError::new("httm could not find any valid datasets on the system.").into())
kimono-koans commented 8 months ago

Appreciate this detailed response.

Just to give you some background, in order to support btrfs, I made some changes such that we report back certain errors when btrfs fails, which I would have previously just flattened into non-results, and gone about our/my day. This also meant I could accept all errors at this point in the program, including certain IO errors from ZFS, etc., datasets.

I decided to accept all non-permission related errors. Right now, given your issue, AFAICT there is no reason not to just flatten all errors here too, although I'd like to know a little more about why this happens.

This branch should fix the issue but also print a more detailed error: https://github.com/kimono-koans/httm/tree/fix_error_re_snap_paths

Would you mind terribly building and running and letting me know what may be special about the mount/s that trigger the failure?

Some additional questions: Do you have any non-ZFS supported datasets on the system? btrfs, nilfs2? Is there any reason reading from ZFS snapshot virtual directory (.zfs/snapshot) might fail? Permissions are the most obvious reason, but again we specifically still flatten all those errors.

cbreak-black commented 8 months ago

I get the following output:

Some(13) at mount path "/root"
Some(2) at mount path "/snap"

The /snap mountpoint is from ubuntu's Snap packaging system thingie:

rpool/ROOT/ubuntu-live on /snap type zfs (rw,relatime,xattr,posixacl,casesensitive)

I think it's a bind mount or something like that (not exactly what it is), but it's not an actual zfs dataset, but it seems to inherit the zfs type from my boot filesystem. Since it's not an actual zfs dataset, it doesn't have a hidden .zfs either.

kimono-koans commented 8 months ago

Very interesting!

/snap on my Ubuntu system isn't reported like this. And good point -- I may have to fool around with bind mounts/similar things to see how they behave as well.

I think, for now, I'm simply going to flatten everything from this function/our ZFS defined mount parsing. I'll probably have an update out later today, and, at that time, I'll close this issue.

Thanks. Appreciate your help.

cbreak-black commented 8 months ago

This might be because I boot off of zfs, so my root filesystem is zfs. the findmnt tool reports it as (heavily truncated):

TARGET                                   SOURCE                                         FSTYPE OPTIONS
/                                        rpool/ROOT/ubuntu-live                         zfs    rw,relatime,xattr,posixacl,casesensitive
├─/snap                                  rpool/ROOT/ubuntu-live[/snap]                  zfs    rw,relatime,xattr,posixacl,casesensitive
├─/home                                  rpool/USERDATA                                 zfs    rw,noatime,xattr,posixacl,casesensitive
│ └─/home/cbreak                         rpool/USERDATA/cbreak                          zfs    rw,noatime,xattr,posixacl,casesensitive

Thanks for the quick fix!

kimono-koans commented 8 months ago

https://github.com/kimono-koans/httm/releases/tag/0.37.1