Open sschmitz opened 6 years ago
trying on XFS. I kinda can reproduce this much simpler. It will just never work. It wont dedupe a b ever.
Unless I make a third copy of it???
I debugged further into this.
The "sync" breaks it!?
Without running sync after the cp and duperremove it works, running sync makes it not detect the dupe...
Okay I figured out what is going on.
"duperemove" in it's default mode just doesn't actually detect logically duplicate files at all, but bases it's checksuming on physical "file extents" (I had never heard of this before this). That is, if two files are logically identical, but physically differently fragmented on the disc, they are not going to get deduped.
I don't know if this is fully by design, or a misunderstanding, etc. in your case (as in my copy) the files "a" and "b" just got differently fragmented, and thus are not deduped.
Workaround: run "sudo xfs_fsr" before running duperemove and it will likely turn out fine and dedupe the files as they were defragmented before...
Okay I'm stupid and didn't understand the whole tool/process, please take #218 into account.
I'm seing this in btrfs as well:
[0x55b065eba800] Dedupe for file "/home/renich/.cache/tracker/meta.db" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/.minikube/machines/minikube/minikube.rawdisk" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/free/samples/bass.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/src/grupobons/data/journal/WiredTigerPreplog.0000000002" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/src/grupobons/data/journal/WiredTigerLog.0000000007" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Desktop/podman/gitea/db/ib_logfile0" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Desktop/podman/gitea/db/ib_logfile1" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/world war has been fought.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/war is profit.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/striking towns.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/stereo strings.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/sine.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/poverty of the million.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/sinth.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/strings.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/noise.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/piano.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/pinochet dies.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta us on mex.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/people of darfur.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta fails promises on mexico.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/death toll bagdad.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/babble harp.wav" had status (-22) "Invalid argument".
I'm seing this in btrfs as well:
[0x55b065eba800] Dedupe for file "/home/renich/.cache/tracker/meta.db" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/.minikube/machines/minikube/minikube.rawdisk" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/free/samples/bass.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/src/grupobons/data/journal/WiredTigerPreplog.0000000002" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/src/grupobons/data/journal/WiredTigerLog.0000000007" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Desktop/podman/gitea/db/ib_logfile0" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Desktop/podman/gitea/db/ib_logfile1" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/world war has been fought.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/war is profit.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/striking towns.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/stereo strings.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/sine.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/poverty of the million.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/sinth.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/strings.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/noise.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/piano.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/pinochet dies.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta us on mex.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/people of darfur.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta fails promises on mexico.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/death toll bagdad.wav" had status (-22) "Invalid argument". [0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/babble harp.wav" had status (-22) "Invalid argument".
What kernel version are you using? Are those files on a separate subvolume than the main one? Also can you provide the sizes of the files ? Can you provide 2 of those .wav files for me to test ?
[renich@introdesk ~]$ uname -a
Linux introdesk.g02.org 5.7.7-200.fc32.x86_64 #1 SMP Wed Jul 1 19:53:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
They're on a sub-volume. The subvolume is /home.
Yes, I can provide the sizes and the wav files gladly.
while read f; do ls -lZ "$f"; done < files
-rw-------. 1 renich renich unconfined_u:object_r:cache_home_t:s0 465920000 Jul 8 12:30 /home/renich/.cache/tracker/meta.db
-rw-r--r--. 1 renich renich unconfined_u:object_r:user_home_t:s0 51200000000 Jun 29 19:02 /home/renich/.minikube/machines/minikube/minikube.rawdisk
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 58455446 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/free/samples/bass.wav'
-rw-------. 1 renich renich system_u:object_r:container_file_t:s0:c189,c932 104857600 Mar 6 01:22 /home/renich/src/grupobons/data/journal/WiredTigerPreplog.0000000002
-rw-------. 1 renich renich system_u:object_r:container_file_t:s0:c189,c932 104857600 Mar 6 01:32 /home/renich/src/grupobons/data/journal/WiredTigerLog.0000000007
-rw-rw----. 1 100998 100998 system_u:object_r:container_file_t:s0:c350,c431 50331648 Jun 2 20:40 /home/renich/Desktop/podman/gitea/db/ib_logfile0
-rw-rw----. 1 100998 100998 system_u:object_r:container_file_t:s0:c350,c431 50331648 Jun 2 20:23 /home/renich/Desktop/podman/gitea/db/ib_logfile1
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/world war has been fought.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/war is profit.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/striking towns.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/stereo strings.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/sine.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/poverty of the million.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 69120124 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/sinth.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 69120124 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/strings.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 69120124 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/noise.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 69120124 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/piano.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/pinochet dies.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta us on mex.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/people of darfur.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta fails promises on mexico.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/death toll bagdad.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26 2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/babble harp.wav'
The files are attached.
Also, the songs can be found here: http://jamen.do/t/762512 ;D samples.tar.gz
I wasn't able to reproduce using the default settings from master. However if I switch to using a v2 (old) file where dedupe is performed on block (rather than extent) granularity I can see a bunch of identical blocks and subsequently can dedupilcate them. Please provide the command line with which you are running duperemove as well as run it with the --debug option and provide the resulting log. It would be ideal if you could run it just on those files that produce the EINVAL errors so as to reduce noise.
OK, I will try using v2 and report back.
The comand line I am using is: duperemove -dhr --hashfile=$HOME/.local/lib/duperemove.db $HOME/
I have installed the current master (commit 1dbf731) on Linux Mint 18, Kernel:
I have an empty btrfs volume mounted on
/mnt/tmp
and will be using the following as a test case:sync; btrfs fi df
expectedly shows 256 MiB of disk usage. If I now runsudo duperemove -d a b
(i.e., run duperemove as root), everything works as it should: the output shows 256 MiB of processed data and a net change in shared extents of 128 MiB.btrfs fi df
confirms that.However, if I instead run duperemove as the (unprivileged) owner of the files, I get the following:
There's a total of 651 "Invalid argument" errors in there. The amount of processed data is lower than 256 MiB. Still, apparently there is a change in shared extents of some 118 MiB. (These numbers vary with the random data.) This is, however, not reflected by
btrfs fi df
, which still shows 256 MiB of disk usage.If I run duperemove again unchanged, I get the same number of error messages, the same amount of processed data, and a net change of 0.
If I now run duperemove as root on those files, it does not show any errors, as in the very first case. It also shows the full 256 MiB having been processed. The net change in shared extents, however, is 0, and
btrfs fi df
confirms that no deduplication has taken place.This means that, on my machine, a) only root can deduplicate files, and b) once any unprivileged user has tried to deduplicate a set of files, those files will not be able to be deduplicated anymore, even by root.
That is ... bizarre. Not that I am not using a hashfile here, so all persistent state between program runs must somehow be in the metadata of the file system.