markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
816 stars 81 forks source link

"Invalid argument" when trying to deduplicate as non-root; prevents root from deduplicating later #199

Open sschmitz opened 6 years ago

sschmitz commented 6 years ago

I have installed the current master (commit 1dbf731) on Linux Mint 18, Kernel:

$ uname -rv
4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018

I have an empty btrfs volume mounted on /mnt/tmp and will be using the following as a test case:

$ sudo chmod 777 .
$ dd if=/dev/urandom bs=128K count=1024 of=a # create a 128 MiB file
$ cp a b

sync; btrfs fi df expectedly shows 256 MiB of disk usage. If I now run sudo duperemove -d a b (i.e., run duperemove as root), everything works as it should: the output shows 256 MiB of processed data and a net change in shared extents of 128 MiB. btrfs fi df confirms that.

However, if I instead run duperemove as the (unprivileged) owner of the files, I get the following:

$ duperemove -d a b
Using 128K blocks
Using hash: murmur3
Gathering file list...
Using 2 threads for file hashing phase
[0/2] (00.00%) csum: /mnt/tmp/a
[1/2] (50.00%) csum: /mnt/tmp/b
Total files:  2
Total hashes: 2048
Loading only duplicated hashes from hashfile.
Using 2 threads for dedupe phase
[0xb25050] (0000/1024) Try to dedupe extents with id ffd79097
[0xb25050] Dedupe 1 extents (id: ffd79097) with target: (116391936, 131072), "/mnt/tmp/b"
[0xb250a0] (0001/1024) Try to dedupe extents with id ffbeaa06
[0xb25050] (0000/1024) Try to dedupe extents with id ffd79097
[0xb25050] Dedupe 1 extents (id: ffd79097) with target: (116391936, 131072), "/mnt/tmp/a"
[0xb25050] (0002/1024) Try to dedupe extents with id ffa1d3a7
[0xb250a0] Dedupe 1 extents (id: ffbeaa06) with target: (91488256, 131072), "/mnt/tmp/b"
[0xb25050] Dedupe 1 extents (id: ffa1d3a7) with target: (40239104, 131072), "/mnt/tmp/b"
[0xb250a0] (0001/1024) Try to dedupe extents with id ffbeaa06
[0xb250a0] Dedupe 1 extents (id: ffbeaa06) with target: (91488256, 131072), "/mnt/tmp/a"
[0xb250a0] Dedupe for file "/mnt/tmp/b" had status (-22) "Invalid argument".
[0xb250a0] (0003/1024) Try to dedupe extents with id ff51194d
[0xb25050] (0002/1024) Try to dedupe extents with id ffa1d3a7
[0xb250a0] Dedupe 1 extents (id: ff51194d) with target: (14548992, 131072), "/mnt/tmp/b"
[0xb250a0] Dedupe for file "/mnt/tmp/a" had status (-22) "Invalid argument".
[0xb250a0] (0003/1024) Try to dedupe extents with id ff51194d
[0xb250a0] Dedupe 1 extents (id: ff51194d) with target: (14548992, 131072), "/mnt/tmp/a"
[0xb25050] Dedupe 1 extents (id: ffa1d3a7) with target: (40239104, 131072), "/mnt/tmp/a"
[0xb25050] (0004/1024) Try to dedupe extents with id fed30a3a
[0xb25050] Dedupe 1 extents (id: fed30a3a) with target: (72613888, 131072), "/mnt/tmp/b"
[0xb25050] (0004/1024) Try to dedupe extents with id fed30a3a
[0xb250a0] (0005/1024) Try to dedupe extents with id fec9f973
[0xb25050] Dedupe 1 extents (id: fed30a3a) with target: (72613888, 131072), "/mnt/tmp/a"
[0xb250a0] Dedupe 1 extents (id: fec9f973) with target: (55050240, 131072), "/mnt/tmp/b"
[0xb250a0] Dedupe for file "/mnt/tmp/a" had status (-22) "Invalid argument".
[...]
[0xb25050] (1022/1024) Try to dedupe extents with id 00549de0
[0xb25050] Dedupe 1 extents (id: 00549de0) with target: (786432, 131072), "/mnt/tmp/b"
[0xb250a0] Dedupe 1 extents (id: 00b8f179) with target: (114163712, 131072), "/mnt/tmp/a"
[0xb250a0] Dedupe for file "/mnt/tmp/b" had status (-22) "Invalid argument".
[0xb25050] (1022/1024) Try to dedupe extents with id 00549de0
[0xb250a0] (1023/1024) Try to dedupe extents with id 00511b3d
[0xb250a0] Dedupe 1 extents (id: 00511b3d) with target: (70385664, 131072), "/mnt/tmp/b"
[0xb25050] Dedupe 1 extents (id: 00549de0) with target: (786432, 131072), "/mnt/tmp/a"
[0xb25050] Dedupe for file "/mnt/tmp/b" had status (-22) "Invalid argument".
[0xb250a0] (1023/1024) Try to dedupe extents with id 00511b3d
[0xb250a0] Dedupe 1 extents (id: 00511b3d) with target: (70385664, 131072), "/mnt/tmp/a"
Kernel processed data (excludes target files): 183107584
Comparison of extent info shows a net change in shared extents of: 124387328

There's a total of 651 "Invalid argument" errors in there. The amount of processed data is lower than 256 MiB. Still, apparently there is a change in shared extents of some 118 MiB. (These numbers vary with the random data.) This is, however, not reflected by btrfs fi df, which still shows 256 MiB of disk usage.

If I run duperemove again unchanged, I get the same number of error messages, the same amount of processed data, and a net change of 0.

If I now run duperemove as root on those files, it does not show any errors, as in the very first case. It also shows the full 256 MiB having been processed. The net change in shared extents, however, is 0, and btrfs fi df confirms that no deduplication has taken place.

This means that, on my machine, a) only root can deduplicate files, and b) once any unprivileged user has tried to deduplicate a set of files, those files will not be able to be deduplicated anymore, even by root.

That is ... bizarre. Not that I am not using a hashfile here, so all persistent state between program runs must somehow be in the metadata of the file system.

axkibe commented 5 years ago

trying on XFS. I kinda can reproduce this much simpler. It will just never work. It wont dedupe a b ever.

Unless I make a third copy of it???

axkibe commented 5 years ago

I debugged further into this.

The "sync" breaks it!?

Without running sync after the cp and duperremove it works, running sync makes it not detect the dupe...

axkibe commented 5 years ago

Okay I figured out what is going on.

"duperemove" in it's default mode just doesn't actually detect logically duplicate files at all, but bases it's checksuming on physical "file extents" (I had never heard of this before this). That is, if two files are logically identical, but physically differently fragmented on the disc, they are not going to get deduped.

I don't know if this is fully by design, or a misunderstanding, etc. in your case (as in my copy) the files "a" and "b" just got differently fragmented, and thus are not deduped.

Workaround: run "sudo xfs_fsr" before running duperemove and it will likely turn out fine and dedupe the files as they were defragmented before...

axkibe commented 5 years ago

Okay I'm stupid and didn't understand the whole tool/process, please take #218 into account.

renich commented 4 years ago

I'm seing this in btrfs as well:

[0x55b065eba800] Dedupe for file "/home/renich/.cache/tracker/meta.db" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/.minikube/machines/minikube/minikube.rawdisk" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/free/samples/bass.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/src/grupobons/data/journal/WiredTigerPreplog.0000000002" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/src/grupobons/data/journal/WiredTigerLog.0000000007" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Desktop/podman/gitea/db/ib_logfile0" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Desktop/podman/gitea/db/ib_logfile1" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/world war has been fought.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/war is profit.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/striking towns.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/stereo strings.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/sine.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/poverty of the million.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/sinth.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/strings.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/noise.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/piano.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/pinochet dies.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta us on mex.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/people of darfur.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta fails promises on mexico.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/death toll bagdad.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/babble harp.wav" had status (-22) "Invalid argument".
lorddoskias commented 4 years ago

I'm seing this in btrfs as well:

[0x55b065eba800] Dedupe for file "/home/renich/.cache/tracker/meta.db" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/.minikube/machines/minikube/minikube.rawdisk" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/free/samples/bass.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/src/grupobons/data/journal/WiredTigerPreplog.0000000002" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/src/grupobons/data/journal/WiredTigerLog.0000000007" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Desktop/podman/gitea/db/ib_logfile0" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Desktop/podman/gitea/db/ib_logfile1" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/world war has been fought.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/war is profit.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/striking towns.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/stereo strings.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/sine.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/poverty of the million.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/sinth.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/strings.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/noise.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/piano.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/pinochet dies.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta us on mex.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/people of darfur.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta fails promises on mexico.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/death toll bagdad.wav" had status (-22) "Invalid argument".
[0x55b065eba800] Dedupe for file "/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/babble harp.wav" had status (-22) "Invalid argument".

What kernel version are you using? Are those files on a separate subvolume than the main one? Also can you provide the sizes of the files ? Can you provide 2 of those .wav files for me to test ?

renich commented 4 years ago
[renich@introdesk ~]$ uname -a
Linux introdesk.g02.org 5.7.7-200.fc32.x86_64 #1 SMP Wed Jul 1 19:53:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

They're on a sub-volume. The subvolume is /home.

Yes, I can provide the sizes and the wav files gladly.

 while read f; do ls -lZ "$f"; done < files 
-rw-------. 1 renich renich unconfined_u:object_r:cache_home_t:s0 465920000 Jul  8 12:30 /home/renich/.cache/tracker/meta.db
-rw-r--r--. 1 renich renich unconfined_u:object_r:user_home_t:s0 51200000000 Jun 29 19:02 /home/renich/.minikube/machines/minikube/minikube.rawdisk
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 58455446 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/free/samples/bass.wav'
-rw-------. 1 renich renich system_u:object_r:container_file_t:s0:c189,c932 104857600 Mar  6 01:22 /home/renich/src/grupobons/data/journal/WiredTigerPreplog.0000000002
-rw-------. 1 renich renich system_u:object_r:container_file_t:s0:c189,c932 104857600 Mar  6 01:32 /home/renich/src/grupobons/data/journal/WiredTigerLog.0000000007
-rw-rw----. 1 100998 100998 system_u:object_r:container_file_t:s0:c350,c431 50331648 Jun  2 20:40 /home/renich/Desktop/podman/gitea/db/ib_logfile0
-rw-rw----. 1 100998 100998 system_u:object_r:container_file_t:s0:c350,c431 50331648 Jun  2 20:23 /home/renich/Desktop/podman/gitea/db/ib_logfile1
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/world war has been fought.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/war is profit.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/striking towns.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/stereo strings.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/sine.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/poverty of the million.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 69120124 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/sinth.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 69120124 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/strings.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 69120124 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/noise.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 69120124 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/transmision/samples/piano.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/pinochet dies.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta us on mex.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/people of darfur.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/nafta fails promises on mexico.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/death toll bagdad.wav'
-rwx------. 1 renich renich unconfined_u:object_r:user_home_t:s0 88012924 Aug 26  2017 '/home/renich/Bitwig Studio/Projects/introbella/invierno/samples/babble harp.wav'

The files are attached.

Also, the songs can be found here: http://jamen.do/t/762512 ;D samples.tar.gz

lorddoskias commented 4 years ago

I wasn't able to reproduce using the default settings from master. However if I switch to using a v2 (old) file where dedupe is performed on block (rather than extent) granularity I can see a bunch of identical blocks and subsequently can dedupilcate them. Please provide the command line with which you are running duperemove as well as run it with the --debug option and provide the resulting log. It would be ideal if you could run it just on those files that produce the EINVAL errors so as to reduce noise.

renich commented 4 years ago

OK, I will try using v2 and report back.

The comand line I am using is: duperemove -dhr --hashfile=$HOME/.local/lib/duperemove.db $HOME/