pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.92k stars 73 forks source link

Failure to read creation time on ZFS #55

Closed fryfrog closed 3 years ago

fryfrog commented 3 years ago

It looks like stat does not report creation time from zfs properly, listing now "birth". I assume what ever fclones is using is doing similar and not getting a creation time reported. I'm still digging around for details, Does ZFS store "Birth Time" or "Creation Time" ? is what I've uncovered so far.

failures:

---- dedupe::test::test_partition_respects_keep_patterns stdout ----
[2021-06-05 20:02:23.458] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/keep/file_3: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/keep/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/keep/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/keep/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_keep_patterns' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:904:68
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- dedupe::test::test_partition_respects_drop_patterns stdout ----
[2021-06-05 20:02:23.458] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_3: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_drop_patterns' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:923:68

---- dedupe::test::test_partition_respects_creation_time_priority stdout ----
[2021-06-05 20:02:23.458] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/ctime_priority/file_3: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/ctime_priority/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/ctime_priority/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.459] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/ctime_priority/file_1: creation time is not available for the filesystem
thread 'dedupe::test::test_partition_respects_creation_time_priority' panicked at 'called `Result::unwrap()` on an `Err` value: Error { message: "Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read." }', src/dedupe.rs:856:80

---- dedupe::test::test_run_dedupe_script stdout ----
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/dedupe_script/file_3: creation time is not available for the filesystem
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/dedupe_script/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/dedupe_script/file_2: creation time is not available for the filesystem
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261:  warn: Failed to read creation time of file /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/dedupe_script/file_1: creation time is not available for the filesystem
[2021-06-05 20:02:23.466] fclones-fe24705dd771f261:  warn: Could not determine files to drop in group with hash 00000000000000000000000000000000 and len 0: Metadata of some files could not be read.
thread 'dedupe::test::test_run_dedupe_script' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `2`', src/dedupe.rs:944:13

failures:
    dedupe::test::test_partition_respects_creation_time_priority
    dedupe::test::test_partition_respects_drop_patterns
    dedupe::test::test_partition_respects_keep_patterns
    dedupe::test::test_run_dedupe_script

test result: FAILED. 92 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out; finished in 65.13s
0 ✓ fryfrog@apollo ~ $ ls -alh /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
-rw-r--r-- 1 fryfrog fryfrog 0 Jun  5 20:02 /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
0 ✓ fryfrog@apollo ~ $ stat /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
  File: /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
  Size: 0               Blocks: 1          IO Block: 131072 regular empty file
Device: 19h/25d Inode: 2938048     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/ fryfrog)   Gid: ( 1000/ fryfrog)
Access: 2021-06-05 20:02:23.449795401 -0700
Modify: 2021-06-05 20:02:23.449795401 -0700
Change: 2021-06-05 20:02:23.449795401 -0700
 Birth: -
0 ✓ fryfrog@apollo ~ $ sudo zdb -O rpool/ROOT/arch home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
   2938048    1   128K    512      0     512    512    0.00  ZFS plain file

0 ✓ fryfrog@apollo ~ $ sudo zdb -ddddd rpool/ROOT/arch  2938048
Dataset rpool/ROOT/arch [ZPL], ID 394, cr_txg 20, 81.7G, 1547348 objects, rootbp DVA[0]=<0:2287a77000:1000> DVA[1]=<0:2875361000:1000> [L0 DMU objset] fletcher4 uncompressed unencrypted LE contiguous unique double size=1000L/1000P birth=140448231L/140448231P fill=1547348 cksum=11e787a4a1:3022d1624a43:45bff9d1ab72ff:480dc35cda0302c4

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
   2938048    1   128K    512      0     512    512    0.00  ZFS plain file
                                               176   bonus  System attributes
        dnode flags: USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED
        dnode maxblkid: 0
        path    /home/fryfrog/.cache/paru/clone/fclones/src/fclones-0.12.0/target/test/dedupe/partition/drop/file_1
        uid     1000
        gid     1000
        atime   Sat Jun  5 20:02:23 2021
        mtime   Sat Jun  5 20:02:23 2021
        ctime   Sat Jun  5 20:02:23 2021
        crtime  Sat Jun  5 20:02:23 2021
        gen     140447745
        mode    100644
        size    0
        parent  2938047
        links   1
        pflags  840800000004
pkolaczk commented 3 years ago

Thank you for reporting this. Fclones uses Rust's fs::symlink_metadata to read creation time of a file.

fryfrog commented 3 years ago

And pull request #8509 is why this isn't in ZFS yet. :(

pkolaczk commented 3 years ago

Do you know any workaround? What is the most important is the modification time for checking if the file hasn't been modified since the time the report was generated. I wonder if I could special-case it for zfs.

If not possible to get at least the modification time, I can add a new flag to ignore modification time check.

fryfrog commented 3 years ago

There is stat output in my first post and it does show modify time, just not birth time. So if modify is all you care about, maybe just check that? Or maybe that is all you're looking for, but for some reason the underlying fs::symlink_metadata errors if birth isn't available, even if not needed?

Edit: To be clear, I'm not a developer so I'm not sure if any of the above is accurate or possible! :)

pkolaczk commented 3 years ago

Yeah, sure, np I'll investigate. So far I've found there is a third party crate for getting file times more portably, I'll give it a try.

Anyways, this probably looks like an issue to be filed against the Rust stdlib.

pkolaczk commented 3 years ago

~I can't reproduce on zfs-fuse. Trying with kernel-based zfs...~

Ok, so it looks like the issue is not really the statx call but just the fact that somehow the metadata returned from it doesn't have the creation time populated on the Rust side. I'll need to dig into the Rust stdlib source code...

So overall, fclones is usable on zfs, the only option that doesn't work is prioritizing by file creation time:

pkolaczk@p5520:/zfs/fs1$ fclones remove --priority newest <dupes.txt
[2021-06-06 07:49:59.307] fclones:  info: Started deduplicating
[2021-06-06 07:49:59.313] fclones:  warn: Failed to read creation time of file /zfs/fs1/foo3.txt: creation time is not available for the filesystem
[2021-06-06 07:49:59.313] fclones:  warn: Failed to read creation time of file /zfs/fs1/foo2.txt: creation time is not available for the filesystem
[2021-06-06 07:49:59.313] fclones:  warn: Failed to read creation time of file /zfs/fs1/foo2.txt: creation time is not available for the filesystem
[2021-06-06 07:49:59.313] fclones:  warn: Failed to read creation time of file /zfs/fs1/foo1.txt: creation time is not available for the filesystem
[2021-06-06 07:49:59.313] fclones:  warn: Could not determine files to drop in group with hash 6109f093b3fd5eb1060989c990d1226f and len 4: Metadata of some files could not be read.
[2021-06-06 07:49:59.313] fclones:  info: Processed 0 files and reclaimed 0 B space

But reading the modification date works properly:

pkolaczk@p5520:/zfs/fs1$ fclones remove --priority most-recently-modified <dupes.txt
[2021-06-06 07:50:24.263] fclones:  info: Started deduplicating
[2021-06-06 07:50:24.269] fclones:  info: Processed 2 files and reclaimed 8 B space
pkolaczk commented 3 years ago

Not a bug in the stdlib either, because the documentation for reading the created time says:

    /// The returned value corresponds to the `btime` field of `statx` on
    /// Linux kernel starting from to 4.11, the `birthtime` field of `stat` on other
    /// Unix platforms, and the `ftCreationTime` field on Windows platforms.
fryfrog commented 3 years ago

Yeah, I really think it is on ZFS for not doing statx. :(

pkolaczk commented 3 years ago

To summarize:

I think I can add another priority values for using ctime, so lack of birthtime on ZFS wouldn't be so limiting, but ctime is not exactly the same as birthtime. WDYT?

fryfrog commented 3 years ago

Isn't zfs's crtime birth time? If I were the boss of everything, I'd make zfs fix it. :)

pkolaczk commented 3 years ago

Well, if I only knew how to access crtime... I can easily access ctime (change time), but not crtime. Anyway, I'll check that - maybe there is some way to get down to the raw statx result ;)

pkolaczk commented 3 years ago

I know nothing about zfs, but that crtime looks pretty non-standard. It is not even present in the raw (deprecated) stat structure:

https://doc.rust-lang.org/nightly/std/os/linux/raw/struct.stat.html https://docs.rs/libc/0.2.95/libc/struct.statx.html

pkolaczk commented 3 years ago

I'm closing this because:

patrickwolf commented 1 year ago

It's not actually ZFS specific as I get the same issues with BTRFS. And based on this post it's not even ZFX/BTRFS but a Linux issue with the stat system call https://askubuntu.com/questions/470134/how-do-i-find-the-creation-time-of-a-file/980750#980750