markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
689 stars 75 forks source link

Dupremove hangs - How do I obtain logs? #305

Open PrawnMan opened 10 months ago

PrawnMan commented 10 months ago

Duperemove hangs when I scan one of my mounted drives. I tried to search where to find logs for duperemove, but I cant seem to find them?

So, where are the logs and how do I obtain them? Im running Ubuntu 22.04 (x64).

JackSlateur commented 10 months ago

Hello @PrawnMan

There are no logfile Debugging information are outputed to stdout/stderr, verbosity can be increased with the --debug option

Could you tell me what options are you using ?

There is at least one issue with batching and partial mode, which is being worked against in this branch (could you try it ?)

Best regards,

PrawnMan commented 10 months ago

I think I cloned and compiled the correct branch?

USER@HOST $ ./duperemove --version
duperemove v0.12-58-g4399

The command I ran is

./duperemove -rhv --debug /MOUNTPOINT/MusicDirectory/

Using this version caused it to hang again at the same point.

PrawnMan commented 10 months ago

Update, I did some digging and I think it might be some unicode shennanagins. It hangs on this filename with a weird unicode character: 10 Major Lazer & Dj Snake feat. M<U+009D> Vs Knife Party - Lean On Vs Bonfire (Djs From Mars Club Bootleg).mp3

Trying to 'ls' that file gave me all sorts of weird terminal glitches, I had to run a

basename /path/to/directory/10 Major Lazer & Dj Snake feat. M <the terminal showed nothing after that "M"> | less

Attached is a screenshot of my terminal window with the entirety of the above command in my terminal window.

Screenshot from 2023-09-10 02-59-00

I opened nautilus to that directory and copied the filename and pasted it below, as well as to a pastebin:

10 Major Lazer & Dj Snake feat. M Vs Knife Party - Lean On Vs Bonfire (Djs From Mars Club Bootleg).mp3

https://pastebin.com/82jWsQDM

I suspect that its the strange unicode character thats causing this issue seeing as it caused havok on my terminal window.

EDIT: Added a screenshot showing my terminal output of the command crashing/hanging when encountering the weird character: Screenshot from 2023-09-10 03-06-12

JackSlateur commented 7 months ago

Hello @PrawnMan I checked your issue and was not able to reproduce

Could you try the latest code from master, as there as been a lot of changes related to the scan phase which may have fixed somehow the issue ?

PrawnMan commented 7 months ago

Hello @JackSlateur

I reran the latest code, it the same filename still causes the terminal output to hang.

This character causes it to hang: Ø

JackSlateur commented 7 months ago

Hello @PrawnMan I still cannot reproduce :\

Could you show me your filename as base64 ?

find -iname "yourfile" -printf "%f" | base64

Also, could you give me your environment variables (output of the env command) ?

PrawnMan commented 7 months ago

env command:

$ env SHELL=/bin/bash LANGUAGE=en_AU:en PWD=/home/alien LOGNAME=alien XDG_SESSION_TYPE=tty MOTD_SHOWN=pam HOME=/home/alien LANG=en_AU.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36: SSH_CONNECTION=192.168.0.47 35180 192.168.0.22 22 LESSCLOSE=/usr/bin/lesspipe %s %s XDG_SESSION_CLASS=user TERM=xterm-256color LESSOPEN=| /usr/bin/lesspipe %s USER=alien SHLVL=1 XDG_SESSION_ID=16944 XDG_RUNTIME_DIR=/run/user/1000 SSH_CLIENT=192.168.0.47 35180 22 XDG_DATA_DIRS=/usr/share/gnome:/usr/local/share:/usr/share:/var/lib/snapd/desktop PATH=/home/alien/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus SSH_TTY=/dev/pts/0 _=/usr/bin/env

Base64 filename:

$ find -iname "10 Major Lazer & Dj Snake feat. MØ Vs Knife Party - Lean On Vs Bonfire (Djs From Mars Club Bootleg).mp3" -printf "%f" | base64 MTAgTWFqb3IgTGF6ZXIgJiBEaiBTbmFrZSBmZWF0LiBNw5ggVnMgS25pZmUgUGFydHkgLSBMZWFu IE9uIFZzIEJvbmZpcmUgKERqcyBGcm9tIE1hcnMgQ2x1YiBCb290bGVnKS5tcDM=

JackSlateur commented 7 months ago

Still no luck, things are working for me:

duperemove -h 10\ Major\ Lazer\ \&\ Dj\ Snake\ feat.\ MØ\ Vs\ Knife\ Party\ -\ Lean\ On\ Vs\ Bonfire\ \(Djs\ From\ Mars\ Club\ Bootleg\).mp3 -d 
Gathering file list...
    Files scanned: 1/1 (100.00%)
    Bytes scanned: 10.0MB/10.0MB (100.00%)
    File listing: completed
Hashfile "(null)" written
Loading only identical files from hashfile.
Simple read and compare of file data found 0 instances of files that might benefit from deduplication.
Nothing to dedupe.
Loading only duplicated hashes from hashfile.
Found 0 identical extents.
Simple read and compare of file data found 0 instances of extents that might benefit from deduplication.
Nothing to dedupe.

I reread this thread and noticed that you said Trying to 'ls' that file gave me all sorts of weird terminal glitches Do you mean that running ls "10 Major Lazer & Dj Snake feat. MØ Vs Knife Party - Lean On Vs Bonfire (Djs From Mars Club Bootleg).mp3"' messes stuff ?

Back to duperemove, could you try to strace the smallest command that gets stuck ? For instance: strace -fty duperemove "10 Major Lazer & Dj Snake feat. MØ Vs Knife Party - Lean On Vs Bonfire (Djs From Mars Club Bootleg).mp3"

PrawnMan commented 7 months ago

strace output:

https://pastebin.com/setYau4m

(it was too large to insert as a comment)

Weirdly enough, running duperemove on that file whilst the terminal is in that dir. seems to work?

$ ~/MyApps/duperemove_2023-12-09/duperemove/duperemove 10\ Major\ Lazer\ \&\ Dj\ Snake\ feat.\ MØ\ Vs\ Knife\ Party\ -\ Lean\ On\ Vs\ Bonfire\ \(Djs\ From\ Mars\ Club\ Bootleg\).mp3 Gathering file list... [1/1] (100.00%) csum: /media/AuroraMusic/Archive/Pre-Deezer.Tidal/Audio/Amarok/2016/Djs From Mars - Bootzilla Vol.3 (2015)/Club Edition/10 Major Lazer & Dj Snake feat. MØ Vs Knife Party - Lean On Vs Bonfire (Djs From Mars Club Bootleg).mp3 Hashfile "(null)" written Loading only identical files from hashfile. Simple read and compare of file data found 0 instances of files that might benefit from deduplication. Loading only duplicated hashes from hashfile. Found 0 identical extents. Simple read and compare of file data found 0 instances of extents that might benefit from deduplication.

image

However, running duperemove 3 directories above leads to the same hangup? Its the strangest thing. It previously scanned 11140 files (out of 14122) before it stops scanning and exits.

Could it be that the path lenght in conjunction with the unicode character may be causing the issue? Seeing as if I scan the file directly, it passes, but if I run the scan a few directories up, it hangs again?

https://imgur.com/a/spSQ3Ni

image


EDIT: running the duperemove command from the home dir and pointing it directly to the file, also passes:

$ ~/MyApps/duperemove_2023-12-09/duperemove/duperemove /media/AuroraMusic/Archive/Pre-Deezer.Tidal/Audio/Amarok/2016/Djs\ From\ Mars\ -\ Bootzilla\ Vol.3\ \(2015\)/Club\ Edition/10\ Major\ Lazer\ \&\ Dj\ Snake\ feat.\ MØ\ Vs\ Knife\ Party\ -\ Lean\ On\ Vs\ Bonfire\ \(Djs\ From\ Mars\ Club\ Bootleg\).mp3 Gathering file list... [1/1] (100.00%) csum: /media/AuroraMusic/Archive/Pre-Deezer.Tidal/Audio/Amarok/2016/Djs From Mars - Bootzilla Vol.3 (2015)/Club Edition/10 Major Lazer & Dj Snake feat. MØ Vs Knife Party - Lean On Vs Bonfire (Djs From Mars Club Bootleg).mp3 Hashfile "(null)" written Loading only identical files from hashfile. Simple read and compare of file data found 0 instances of files that might benefit from deduplication. Loading only duplicated hashes from hashfile. Found 0 identical extents. Simple read and compare of file data found 0 instances of extents that might benefit from deduplication.

JackSlateur commented 7 months ago

This is getting more and more interesting ! Can all those file be read succesfully ? Does your dmesg show IO error, perhaps ?

Can you run this without error (I do not need the output): find /media/AuroraMusic/Archive/Pre-Deezer.Tidal/ -type f -exec xxhsum {} \;

PrawnMan commented 7 months ago

The output of the command that you mentioned stops at the same file. The plot thickens ......

Dmesg doesnt show any i/o errors after I ran the xxhsum command.