rck / dush

List N largest files in a directory (recursive search), print some nice graphs
GNU General Public License v3.0
6 stars 2 forks source link

dush ignores some files #2

Closed qwerty007 closed 9 years ago

qwerty007 commented 9 years ago
$ ls -lh
total 16G
-rw-r--r-- 1 karol users 818M Oct 20 11:42 Arcade_Longplay_474_Gradius_IV_-_Fukkatsu.mkv
-rw-r--r-- 1 karol users 425M Oct 20 11:34 Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_1_of_5.mkv
-rw-r--r-- 1 karol users 1.1G Oct 20 11:52 Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_2_of_5.mkv
-rw-r--r-- 1 karol users 623M Oct 20 12:29 PC_Longplay_149_Guilty_Gear_X.mkv
-rw-r--r-- 1 karol users 733M Oct 20 12:19 PC_Longplay_193_Guilty_Gear_XX_Reload.mkv
-rw-r--r-- 1 karol users 829M Oct 20 12:40 PC_Longplay_327_Ricochet_Infinity_part_1_of_5.avi
-rw-r--r-- 1 karol users 3.3G Oct 20 13:32 PC_Longplay_372_Starcraft.mkv
-rw-r--r-- 1 karol users 3.4G Oct 20 13:40 PC_Longplay_372_Starcraft_Brood_War.mkv
-rw-r--r-- 1 karol users 953M Oct 20 12:33 PC_Longplay_381_Steel_Saviour.mkv
-rw-r--r-- 1 karol users 4.1G Oct 20 13:35 PC_Longplay_393_Left_4_Dead.mkv
-rw-r--r-- 1 karol users  62M Oct 20 11:19 PC_Longplay_521_Street_Fighter_II.mkv
$ dush -t -n 20
Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_2_of_5.mkv:  1081  MB
PC_Longplay_327_Ricochet_Infinity_part_1_of_5.avi:                  828   MB
Arcade_Longplay_474_Gradius_IV_-_Fukkatsu.mkv:                      817   MB
PC_Longplay_193_Guilty_Gear_XX_Reload.mkv:                          732   MB
PC_Longplay_521_Street_Fighter_II.mkv:                              61    MB

I've installed dush from the AUR https://aur.archlinux.org/packages/dush/

rck commented 9 years ago

Sorry, but I am not able to reproduce the issue. I tried to reproduce it with the following script: https://gist.github.com/rck/b132565aa3e8304fac88

For me dush works as expected and produces the correct result showing all files. Can you reproduce the issue with the linked script? (Be cautious to not to overwrite your existing files). If possible, run the script on the same file system you have the original files.

Any other hints? File system type? Other "strange" files in the directory, like strange encoding, umlauts, spaces, what ever?

qwerty007 commented 9 years ago

I can reproduce this issue with your script:

$ ./genfiles.sh
818+0 records in
818+0 records out
857735168 bytes (858 MB) copied, 54.6587 s, 15.7 MB/s
425+0 records in
425+0 records out
445644800 bytes (446 MB) copied, 28.3689 s, 15.7 MB/s
1100+0 records in
1100+0 records out
1153433600 bytes (1.2 GB) copied, 73.2542 s, 15.7 MB/s
623+0 records in
623+0 records out
653262848 bytes (653 MB) copied, 41.7005 s, 15.7 MB/s
733+0 records in
733+0 records out
768606208 bytes (769 MB) copied, 48.9742 s, 15.7 MB/s
829+0 records in
829+0 records out
869269504 bytes (869 MB) copied, 54.9463 s, 15.8 MB/s
3300+0 records in
3300+0 records out
3460300800 bytes (3.5 GB) copied, 221.765 s, 15.6 MB/s
3400+0 records in
3400+0 records out
3565158400 bytes (3.6 GB) copied, 227.294 s, 15.7 MB/s
953+0 records in
953+0 records out
999292928 bytes (999 MB) copied, 63.2889 s, 15.8 MB/s
4100+0 records in
4100+0 records out
4299161600 bytes (4.3 GB) copied, 271.815 s, 15.8 MB/s
62+0 records in
62+0 records out
65011712 bytes (65 MB) copied, 4.07254 s, 16.0 MB/s
$ ls
total 16G
-rw-r--r-- 1 818M Oct 23 15:46 Arcade_Longplay_474_Gradius_IV_-_Fukkatsu.mkv
-rw-r--r-- 1 425M Oct 23 15:46 Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_1_of_5.mkv
-rw-r--r-- 1 1.1G Oct 23 15:47 Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_2_of_5.mkv
-rw-r--r-- 1 623M Oct 23 15:48 PC_Longplay_149_Guilty_Gear_X.mkv
-rw-r--r-- 1 733M Oct 23 15:49 PC_Longplay_193_Guilty_Gear_XX_Reload.mkv
-rw-r--r-- 1 829M Oct 23 15:50 PC_Longplay_327_Ricochet_Infinity_part_1_of_5.avi
-rw-r--r-- 1 3.3G Oct 23 15:53 PC_Longplay_372_Starcraft.mkv
-rw-r--r-- 1 3.4G Oct 23 15:57 PC_Longplay_372_Starcraft_Brood_War.mkv
-rw-r--r-- 1 953M Oct 23 15:58 PC_Longplay_381_Steel_Saviour.mkv
-rw-r--r-- 1 4.1G Oct 23 16:03 PC_Longplay_393_Left_4_Dead.mkv
-rw-r--r-- 1  62M Oct 23 16:03 PC_Longplay_521_Street_Fighter_II.mkv
-rwxr-xr-x 1  913 Oct 23 15:44 genfiles.sh
$ dush -n 20
Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_2_of_5.mkv: 1100 MB
PC_Longplay_327_Ricochet_Infinity_part_1_of_5.avi: 829 MB
Arcade_Longplay_474_Gradius_IV_-_Fukkatsu.mkv: 818 MB
PC_Longplay_193_Guilty_Gear_XX_Reload.mkv: 733 MB
PC_Longplay_521_Street_Fighter_II.mkv: 62 MB
genfiles.sh: 0 MB

I'm using ext4, I can also reproduce it in a few other directories I've tried. It seems to be related to the longplays I've downloaded from archive.org and some other multimedia files. I've downloaded them all in the same manner, using wget.

dush seems to work fine otherwise:

$ find . -type f -size +10M -exec du -sh '{}' \; | sort -nr | head
406M    ./openarena/openarena-0.8.8.zip
397M    ./openarena/openarena-0.8.8-2-i686.pkg.tar.gz
93M     ./openarena/pkg/openarena/opt/openarena/baseoa/pak4-textures.pk3
71M     ./openarena/pkg/openarena/opt/openarena/baseoa/pak2-players.pk3
68M     ./openarena/pkg/openarena/opt/openarena/baseoa/pak6-patch088.pk3
37M     ./openarena/pkg/openarena/opt/openarena/baseoa/pak1-maps.pk3
37M     ./openarena/pkg/openarena/opt/openarena/baseoa/pak0.pk3
36M     ./openarena/pkg/openarena/opt/openarena/baseoa/pak6-patch085.pk3
26M     ./openarena/pkg/openarena/opt/openarena/baseoa/pak2-players-mature.pk3
24M     ./openarena/pkg/openarena/opt/openarena/baseoa/pak6-misc.pk3
$ dush
openarena-0.8.8.zip: 405 MB
openarena-0.8.8-2-i686.pkg.tar.gz: 396 MB
pak4-textures.pk3: 92 MB
pak2-players.pk3: 70 MB
pak6-patch088.pk3: 67 MB
pak1-maps.pk3: 36 MB
pak0.pk3: 36 MB
pak6-patch085.pk3: 35 MB
pak2-players-mature.pk3: 25 MB
pak6-misc.pk3: 23 MB

'dush' is not an alias, nothing strange or unusual AFAICT. 'ncdu' sees all files and sorts them correctly.

rck commented 9 years ago

I was able to reproduce it in a 32bit VM. What a nice little, nasty bug. Let me guess: "uname -m" shows you are on a 32bit platform, right? dush uses nftw(3) to walk the directory tree, and calls a function as long as stat(2) is successful on the file to process. This fails with "EOVERFLOW" for big files because dush is not compiled with "-D_FILE_OFFSET_BITS=64".

This should be fixed in the current master (here on github). Please update and test: $ git pull/clone (do not use the the PKGBUILD, it pulls in a tagged version) $ cd dush && mkdir build && cd build $ cmake .. && make $ ./dush PATH_TO_YOUR_DIR

Please confirm that the issue is fixed, I will then tag a new release and upload it to the AUR.

qwerty007 commented 9 years ago

Yes, I'm on 32-bit. I've removed every '_gittag' mention from the PKGBUILD, does it make sense?

$ diff -Naur PKGBUILD ../dush-git/PKGBUILD
--- PKGBUILD    2013-11-26 20:13:52.000000000 +0000
+++ ../dush-git/PKGBUILD        2014-10-23 21:16:09.281331542 +0000
@@ -22,18 +22,17 @@

 _gitroot=git://github.com/rck/dush.git
 _gitname=dush
-_gittag=v0.9

 build() {
   cd "$srcdir"
   msg "Connecting to GIT server...."

   if [[ -d "$_gitname" ]]; then
-    cd "$_gitname" && git pull origin "$_gittag"
+    cd "$_gitname" && git pull origin
     msg "The local files are updated."
   else
     git clone "$_gitroot" "$_gitname"
-    cd "$_gitname" && git checkout -b "$_gittag" "$_gittag"
+    cd "$_gitname" && git checkout
   fi

   msg "GIT checkout done or server timeout"

dush seems to work fine now. I've tested it on some multimedia files and using the script you provided. BTW, the 'find' line I used previously is wrong, it should say 'sort -hr', not 'sort -nr'.

$ find . -type f -size +10M -exec du -sh '{}' \; | sort -hr | head
3.9G    ./PC_Longplay_393_Left_4_Dead.mkv
3.4G    ./PC_Longplay_372_Starcraft_Brood_War.mkv
3.3G    ./PC_Longplay_372_Starcraft.mkv
1.1G    ./Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_2_of_5.mkv
954M    ./PC_Longplay_381_Steel_Saviour.mkv
830M    ./PC_Longplay_327_Ricochet_Infinity_part_1_of_5.avi
819M    ./Arcade_Longplay_474_Gradius_IV_-_Fukkatsu.mkv
734M    ./PC_Longplay_193_Guilty_Gear_XX_Reload.mkv
624M    ./PC_Longplay_149_Guilty_Gear_X.mkv
426M    ./Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_1_of_5.mkv
$ dush -t
PC_Longplay_393_Left_4_Dead.mkv:                                    3936  MB
PC_Longplay_372_Starcraft_Brood_War.mkv:                            3400  MB
PC_Longplay_372_Starcraft.mkv:                                      3300  MB
Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_2_of_5.mkv:  1100  MB
PC_Longplay_381_Steel_Saviour.mkv:                                  953   MB
PC_Longplay_327_Ricochet_Infinity_part_1_of_5.avi:                  829   MB
Arcade_Longplay_474_Gradius_IV_-_Fukkatsu.mkv:                      818   MB
PC_Longplay_193_Guilty_Gear_XX_Reload.mkv:                          733   MB
PC_Longplay_149_Guilty_Gear_X.mkv:                                  623   MB
Nintendo_WiiU_Longplay_001_New_Super_Mario_Bros_U_part_1_of_5.mkv:  425   MB
rck commented 9 years ago

Your diff makes sense if you want to use a PKGBUILD and want to build HEAD.

Thanks for testing, the new is in the AUR.

qwerty007 commented 9 years ago

You forgot to update the gittag and the PKGBUILD from the AUR still doesn't work:

$ diff -Naur PKGBUILD.old PKGBUILD
--- PKGBUILD.old        2014-10-26 13:31:43.171510892 +0000
+++ PKGBUILD    2014-10-26 13:31:52.631459643 +0000
@@ -22,7 +22,7 @@

 _gitroot=git://github.com/rck/dush.git
 _gitname=dush
-_gittag=v0.9
+_gittag=v0.10

 build() {
   cd "$srcdir"
rck commented 9 years ago

You are right, I forgot the "_gittag". I reuploaded to the AUR as "dush-v0.10-2"