markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
802 stars 80 forks source link

file names are printed verbatim which can cause terminal to block/stop on control character (konsole-23.08.5) #353

Open Alex-K37 opened 2 weeks ago

Alex-K37 commented 2 weeks ago

Tested with duperemove 0.12 and 0.14.1

Example:

> ls -1
TeÃ?t
Teßt
Te?t
> ls -1b
TeÃ\302\237t
Teßt
Te\302\237t

The first and third name includes the unicode codepoint U+009F, which is a control character apparently interpreted by konsole, which causes konsole to stop as soon as this character is output by duperemove. I did not observe the behaviour on xterm, byobu, and the plain virtual console.

Frankly, I am surprised, that certain UNIX/Linux terminal emulators (konsole-23.08.5 on OpenSuSE 15.6) adopt new control characters from Unicode, however, this seems to be a current approach.

JackSlateur commented 3 days ago

Hello @Alex-K37 What a weird issue

As far as I know, filenames are opaque arrays of bytes with very few restrictions So you can totally push all kinds of control characters, which will mess with your vty in various way

I have no solution to prevent that You can use the --quiet flag, which will prevent filenames from being shown

Alex-K37 commented 3 days ago

I had a discussion about this on a linux distribution forum. You seem correct about the "opaque arrays of bytes" convention.

Yet, why is it possible to display file names with all sorts of control characters in the name with 'ls' and not run into the same issue? I never had to deal with terminal control characters before in my simple command line programs.

I can only guess, that some compiler switch in combination with text output library enables filtering of control characters for 'ls'. It could also be a versioning problem, or missing adaptation to Unicode, which allows Unicode C1 control characters to slip through... This is currently out of my leage with respect to glib, or whatever system library provides stdout/stderr output functionality.

I have been unaware of the C1 codepoints until I checked the UTF-8 codepoints in connection with this issue. Apparently KDE Konsole does interpret those.