shundhammer / qdirstat

QDirStat - Qt-based directory statistics (KDirStat without any KDE - from the original KDirStat author)
GNU General Public License v2.0
1.72k stars 123 forks source link

[Qt4] Files and directories with UTF-8 special characters in the name not read correctly #61

Closed slodki closed 7 years ago

slodki commented 7 years ago

version: 1.3 system: Ubuntu zesty package: 1.3-1~zesty (amd64)

$ ls -lR
.:
razem 112
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 file3_ą
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 file4
drwxrwxr-x 2 slodki slodki  4096 maj 31 19:51 krowa
drwxrwxr-x 2 slodki slodki  4096 maj 31 19:51 żółw

./krowa:
razem 52
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 file1

./żółw:
razem 52
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 file2
$ locale
LANG=pl_PL.UTF-8
LANGUAGE=
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=

All files and directories with polish (or Cyrillic) characters are skipped/ignored in QDirStat: qdirstat-utf8

Works without problems in all other KDE/QT/CLI apps: dolphin-utf8

slodki commented 7 years ago

Log:

2017-05-31 20:16:25.019 [25062] <Info>    Logger.cpp:138 openLogFile():  -- Log Start --
2017-05-31 20:16:25.062 [25062] <Debug>   TreemapView.cpp:46 TreemapView():  
2017-05-31 20:16:25.082 [25062] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by NameCol ascending
2017-05-31 20:16:25.082 [25062] <Debug>   TreemapView.cpp:117 setSelectionModel():  
2017-05-31 20:16:25.083 [25062] <Info>    Cleanup.cpp:415 desktopSpecificApps():  Detected desktop "KDE"
2017-05-31 20:16:25.083 [25062] <Info>    Cleanup.cpp:468 desktopSpecificApps():  %filemanager => "konqueror --profile filemanagement"
2017-05-31 20:16:25.083 [25062] <Info>    Cleanup.cpp:468 desktopSpecificApps():  %terminal => "konsole --workdir %d"
2017-05-31 20:16:25.084 [25062] <Debug>   DebugHelpers.cpp:133 dumpExcludeRules():  <ExcludeRule ".snapshot">
2017-05-31 20:16:25.084 [25062] <Info>    MainWindow.cpp:792 toggleVerboseSelection():  Verbose selection is now off. Change this with Shift-F7.
2017-05-31 20:16:25.099 [25062] <Info>    DirTree.cpp:95 startReading():     url: "/tmp/aaa"
2017-05-31 20:16:25.099 [25062] <Info>    DirTree.cpp:98 startReading():  device: /dev/mapper/ssd-root
2017-05-31 20:16:25.099 [25062] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by NameCol ascending
2017-05-31 20:16:25.100 [25062] <Debug>   DirReadJob.cpp:333 stat():  url: "/tmp/aaa"
2017-05-31 20:16:25.100 [25062] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:16:25.110 [25062] <WARNING> DirReadJob.cpp:281 startReading():  lstat(/tmp/aaa/file3_Ä) failed: Nie ma takiego pliku ani katalogu                                                                                                           
2017-05-31 20:16:25.110 [25062] <WARNING> DirReadJob.cpp:281 startReading():  lstat(/tmp/aaa/żóÅw) failed: Nie ma takiego pliku ani katalogu                                                                                                            
2017-05-31 20:16:25.135 [25062] <WARNING> [Qt] QFileSystemWatcher: failed to add paths: /home/slodki/.config/ibus/bus
2017-05-31 20:16:25.135 [25062] <WARNING> [Qt] Bus::open: Can not get ibus-daemon's address. 
2017-05-31 20:16:25.135 [25062] <Verbose> [Qt] IBusInputContext::createInputContext: no connection to ibus-daemon 
2017-05-31 20:16:25.147 [25062] <Info>    MainWindow.cpp:456 readingFinished():  
2017-05-31 20:16:25.151 [25062] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by PercentNumCol descending
2017-05-31 20:16:25.151 [25062] <Debug>   MainWindow.cpp:439 idleDisplay():  No current branch - expanding tree to level 1
2017-05-31 20:16:25.151 [25062] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:16:25.300 [25062] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:16:41.719 [25062] <Debug>   TreemapView.cpp:173 writeSettings():  
2017-05-31 20:16:41.734 [25062] <Info>    Logger.cpp:79 ~Logger():  -- Log End --
slodki commented 7 years ago

There is fromUtf8() missing maybe? Or fromLocal8Bit() can be better (using locale)?

Using QFile::encodeName is another approach.

shundhammer commented 7 years ago

No, it's there:

https://github.com/shundhammer/qdirstat/blob/master/src/DirReadJob.cpp#L179

What filesystem type is that? Are there any special mount options?

shundhammer commented 7 years ago

See also issue #19:

x

It works in general, for both files and directories with UTF-8 special characters. Something must be different in your setup.

shundhammer commented 7 years ago

qdirstat-utf9

slodki commented 7 years ago

You use toUtf8() on input parameters to libc functions, but not fromUtf8() while storing them into QString.

Problem doesn't depends on filesystem type - all (ext4, cifs, nfs) mounted with utf8 support and working with Qt/CLI apps.

BTW: when dirname with UTF-8 chars is given as starting parameter this name is working correctly: logfile for qdirstat żółw:

2017-05-31 20:49:06.165 [25576] <Info>    Logger.cpp:138 openLogFile():  -- Log Start --
2017-05-31 20:49:06.212 [25576] <Debug>   TreemapView.cpp:46 TreemapView():  
2017-05-31 20:49:06.237 [25576] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by NameCol ascending
2017-05-31 20:49:06.237 [25576] <Debug>   TreemapView.cpp:117 setSelectionModel():  
2017-05-31 20:49:06.238 [25576] <Info>    Cleanup.cpp:415 desktopSpecificApps():  Detected desktop "KDE"
2017-05-31 20:49:06.238 [25576] <Info>    Cleanup.cpp:468 desktopSpecificApps():  %filemanager => "konqueror --profile filemanagement"
2017-05-31 20:49:06.238 [25576] <Info>    Cleanup.cpp:468 desktopSpecificApps():  %terminal => "konsole --workdir %d"
2017-05-31 20:49:06.239 [25576] <Debug>   DebugHelpers.cpp:133 dumpExcludeRules():  <ExcludeRule ".snapshot">
2017-05-31 20:49:06.239 [25576] <Info>    MainWindow.cpp:792 toggleVerboseSelection():  Verbose selection is now off. Change this with Shift-F7.
2017-05-31 20:49:06.255 [25576] <Info>    DirTree.cpp:95 startReading():     url: "/tmp/aaa/żółw"
2017-05-31 20:49:06.255 [25576] <Info>    DirTree.cpp:98 startReading():  device: /dev/mapper/ssd-root
2017-05-31 20:49:06.255 [25576] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by NameCol ascending
2017-05-31 20:49:06.256 [25576] <Debug>   DirReadJob.cpp:333 stat():  url: "/tmp/aaa/żółw"
2017-05-31 20:49:06.256 [25576] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:49:06.271 [25576] <WARNING> [Qt] QFileSystemWatcher: failed to add paths: /home/slodki/.config/ibus/bus
2017-05-31 20:49:06.271 [25576] <WARNING> [Qt] Bus::open: Can not get ibus-daemon's address. 
2017-05-31 20:49:06.271 [25576] <Verbose> [Qt] IBusInputContext::createInputContext: no connection to ibus-daemon 
2017-05-31 20:49:06.271 [25576] <Info>    MainWindow.cpp:456 readingFinished():  
2017-05-31 20:49:06.275 [25576] <Debug>   DirTreeModel.cpp:544 sort():  Sorting by PercentNumCol descending
2017-05-31 20:49:06.275 [25576] <Debug>   MainWindow.cpp:439 idleDisplay():  No current branch - expanding tree to level 1
2017-05-31 20:49:06.275 [25576] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:49:06.456 [25576] <Debug>   MainWindow.cpp:582 expandTreeToLevel():  Expanding tree to level 1
2017-05-31 20:49:27.469 [25576] <Debug>   TreemapView.cpp:173 writeSettings():  
2017-05-31 20:49:27.473 [25576] <Info>    Logger.cpp:79 ~Logger():  -- Log End --
slodki commented 7 years ago

The issue is about skipping/ignoring files and dirs, not displaying them. As you can see dir with wrong chars is treated as file, all such files are reported as empty (0 bytes).

slodki commented 7 years ago

The polish letter ą is supported by my config using UTF-8 locale:

$ ls -l $'/tmp/aaa/file3_\u0105'
-rw-rw-r-- 1 slodki slodki 51200 maj 31 19:51 /tmp/aaa/file3_ą

strace ls -l $'/tmp/aaa/file3_\u0105' |& grep lstat.*file3
lstat("/tmp/aaa/file3_\304\205", {st_mode=S_IFREG|0664, st_size=51200, ...}) = 0

As you can see all works when use 2-byte encoding with 0xC4 0x85.

But QDirStat is trying to use 4 bytes 0xC3 0x84 0xC2 0x85:

$ strace qdirstat . |& grep lstat.*file3
lstat("/tmp/aaa/file3_\303\204\302\205", 0x7ffcb9021a60) = -1 ENOENT (No such file or directory)
slodki commented 7 years ago

And as you can see 0xC3 0x84 is UTF-8 encoded ASCII char 0xC4.

In my opinion QDirStat is doing something like toUtf8(toUtf8(QString)). And when you don't use toUtf8 (like in command line params) there is no error.

shundhammer commented 7 years ago

You are right, of course. Added that missing fromUtf8() call.

One thing I don't understand at all, though, is why it ever worked for me. I also have a UTF-8 environment (de_DE.utf8), and as you can see from the screenshots, not only did it display those files and directories, it also correctly displayed their metadata (size etc.).

OTOH the Qt docs explicitly say that it uses fromUtf8() by default when constructing a QString from a const char *:

http://doc.qt.io/qt-5/qstring.html#initializing-a-string

shundhammer commented 7 years ago

Does it work for you with commit https://github.com/shundhammer/qdirstat/commit/e4cf683195f91e63054d282ffbc70b2bdb08a478 ?

shundhammer commented 7 years ago

Just a thought: Are you using QDirStat built with Qt 4.x? In Qt 4.x, the QString constructor from const char * used fromAscii() rather than fromUtf8(). That might explain the different behaviour.

I am not 100% sure, but I think NHellFire's PPA builds QDirStat against Qt 4.

shundhammer commented 7 years ago

Bingo. I just downloaded and unpacked it, and voila:

[sh @ nazgul] ~/tmp 25 % ldd nhellfire-ppa/usr/bin/qdirstat | grep -i qt
    libQtGui.so.4 => /usr/lib/x86_64-linux-gnu/libQtGui.so.4 (0x00007f1accfc2000)
    libQtCore.so.4 => /usr/lib/x86_64-linux-gnu/libQtCore.so.4 (0x00007f1accadd000)

So if you are using that version of QDirStat, that would perfectly explain the discrepancy between your and my results.

But anyway, explicitly doing the conversion fromUtf8() and not relying on implicit behaviour is definitely the more reliable way.

shundhammer commented 7 years ago

Please reopen if commit e4cf683 did not fix the problem for you.

slodki commented 7 years ago

I've tested it with qt4 and qt5 and both work ok now.

shundhammer commented 7 years ago

Thanks for confirming this!