pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.94k stars 75 forks source link

[dedup] Use APFS clone (CoW) on macOS #219

Closed boozook closed 1 year ago

boozook commented 1 year ago

Bug, kind: critical, dedup-cmd broken on macOS. Docs: man clonefile Min macOS: 10.12 (Sierra) FS required: APFS

TL;DR: Use cp -c instead of cp --reflink on macOS.

Explaination:

cp -c uses clonefile syscall, flag -c overrides default copy/duplicate behaviour and clone files via clonefile() instead — see man cp(1). The behaviour is identical to that of the Linux cp flag --reflink but in macOS cp have no --reflink parameter, so with --reflink you'll get an errors like cp: illegal option....


Update

Seems to there's not a problem with dedupe, but with help documentation and cli (interface).

So I was sure that fclones without --dry-run will execute that commands.

kapitainsky commented 1 year ago

wow - indeed dedupe on macOS does nothing... but interestingly does not throw any error neither.

It should be very easy PR to fix it

boozook commented 1 year ago

indeed dedupe on macOS does nothing...

I've run with dedupe --dry-run and inspect output.

kapitainsky commented 1 year ago

but.... 4 x 10G identical files

and I run for real and inspect results

$ ls -lih *
272167869 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:21 10GB.bin
272168957 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:37 10GB.bin.1
272168961 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:37 10GB.bin.2
272168969 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:38 10GB.bin.3

$ df -h /System/Volumes/Data
Filesystem     Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1s1  932Gi  541Gi  332Gi    62% 2404443 3484848560    0%   /System/Volumes/Data

$ fclones group . | fclones dedupe
...
[2023-08-07 22:40:01.028] fclones:  info: Processed 3 files and reclaimed up to 32.2 GB space

$ ls -lih *
272167869 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:21 10GB.bin
272169042 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:37 10GB.bin.1
272169043 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:37 10GB.bin.2
272169044 -rw-r--r--@ 1 kptsky  staff    10G Aug  7 22:38 10GB.bin.3

$ df -h /System/Volumes/Data
Filesystem     Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1s1  932Gi  511Gi  362Gi    59% 2404444 3799378320    0%   /System/Volumes/Data

there is no problem with (CoW) dedupe on APFS

there is problem what --dry-run shows

Bug, kind: critical, dedup-cmd broken on macOS.

can be downgraded from critical to cosmetic

boozook commented 1 year ago

So, how exactly works dedup on macOS? Hard links? Just for understanding. If so, anyway better will be to use clones (copy-on-write) as described above.

kapitainsky commented 1 year ago

You can see that files do not share the same inode so they are not hard links (this is why I added -i option to ls).

They are are proper CoW clones.

boozook commented 1 year ago

Great! Well, so this can be closed or still open for docs/interface improvement. Thank you so much!

kapitainsky commented 1 year ago

Leave it open:) something is not 100% right - at least with --dry-run

kapitainsky commented 1 year ago

I think I have identified the problem. PR already posted.