pkolaczk / fclones

Efficient Duplicate File Finder
MIT License
1.87k stars 70 forks source link

Allow deduplicating read-only files #113

Open Architector4 opened 2 years ago

Architector4 commented 2 years ago

It would be nice to introduce another flag for the dedupe mode, which makes it open files in read-only mode for deduplication. This is useful, for example, to deduplicate files within snapshots with read-only flag set (btrfs or otherwise).

duperemove tool has this flag: https://github.com/markfasheh/duperemove/blob/master/docs/duperemove.html#L134

It does make little sense to perform a kind-of-write operation on a file opened in read-only mode, but it does help in duperemove by giving it one more use case. I guess it only works as a privileged user (root)?

So far I haven't found a way to deduplicate such snapshots with fclones other than knocking the read-only flag off on them.

pkolaczk commented 2 years ago

Does the newly added --no-lock flag help for that? Adding --no-lock will prevent opening files for writes.

Architector4 commented 2 years ago

Does the newly added --no-lock flag help for that? Adding --no-lock will prevent opening files for writes.

No, but it does change the error to a bigger swath of them. I compiled fclones from git just now and tried it. Here's a terminal session log, showing how to reproduce the test environment and duperemove succeeding in it (redundant info abridged with "..."):

root@Architector4PC:/# btrfs subvolume create test
Create subvolume './test'
root@Architector4PC:/# cp --reflink=never /bin/bash /test/a  # with GNU cp; intentionally introduce duplicates
root@Architector4PC:/# cp --reflink=never /bin/bash /test/b
root@Architector4PC:/# btrfs property set /test ro true
root@Architector4PC:/# cd /test
root@Architector4PC:/test# pacman -Qi fclones-git | grep Version
Version         : v0.23.0.r9.9a599cf-1
root@Architector4PC:/test# fclones group . > /tmp/fcl
...
[2022-05-12 22:19:50.115] fclones:  info: Found 1 (948.6 KB) redundant files
root@Architector4PC:/test# fclones dedupe < /tmp/fcl
[2022-05-12 22:20:03.702] fclones:  info: Started deduplicating
[2022-05-12 22:20:03.707] fclones:  warn: Failed to open file /test/b for write: Read-only file system (os error 30)
[2022-05-12 22:20:03.708] fclones:  info: Processed 0 files and reclaimed up to 0 B space
root@Architector4PC:/test# fclones dedupe --no-lock < /tmp/fcl
[2022-05-12 22:20:08.157] fclones:  info: Started deduplicating
[2022-05-12 22:20:08.161] fclones:  warn: Failed to remove temporary /test/b.OsA3vFzLw2ofQbpVrA0x8262: Failed to remove file /test/b.OsA3vFzLw2ofQbpVrA0x8262: No such file or directory (os error 2)
[2022-05-12 22:20:08.161] fclones:  warn: Failed keep metadata for /test/b: Read-only file system (os error 30)
[2022-05-12 22:20:08.161] fclones:  warn: Failed keep metadata for /test: Read-only file system (os error 30)
[2022-05-12 22:20:08.161] fclones:  warn: Failed to deduplicate /test/b -> /test/a: Read-only file system (os error 30)
[2022-05-12 22:20:08.161] fclones:  info: Processed 0 files and reclaimed up to 0 B space
root@Architector4PC:/test# duperemove -rAd .
... (successfully dedupes)
Comparison of extent info shows a net change in shared extents of: 262144
Forza-tng commented 1 year ago

Hi. Am also looking for the option to dedupe across read only snapshots. An important use case is to run on a backup server that receives snapshots from many systems.