hydrogen-music / hydrogen

The advanced drum machine for Linux, macOS, and Windows
http://www.hydrogen-music.org
GNU General Public License v2.0
1.01k stars 172 forks source link

Fix encoding issues #1961

Closed theGreatWhiteShark closed 2 weeks ago

theGreatWhiteShark commented 3 months ago

On Linux and macOS encoding seems to be set to UTF-8 in general. No problems with internationalization here (at least to my knowledge as a Linux user). But on Windows things are different. On my local machine encoding is set to windows-2151 and a couple of file actions relying on the current OS encoding via QString::toLocal8Bit do fail if characters outside of Latin-1 are encountered. This seems to be the case for some users too.

Things failing:

  1. Rendering artifacts in console output
  2. Log file is stored with system encoding instead of UTF-8 and non Latin-1 characters are lost
  3. Song export to WAV/FLAC.., MIDI, and LilyPond fails if non Latin-1 characters are part of the resulting file path
  4. Drumkit import and export fails if non Latin-1 characters are part of the resulting file path

Point 1., 2., and 3. were fixed by both enforcing UTF-8 encoding for all text files and refactoring code to use Qt's methods and classes for file writing whenever possible.

4., however, could not be fixed. Or at least I do not see a nice solution. (We could make Qt instead of libarchive open the files and just pass the file pointers to libarchive. But this does not feel like a good and sustainable solution). Instead, we now have the official limitation that drumkit archives must not carry non-ASCII characters in order for the kit to be supported on all platforms.

In case importing or exporting failed due to encoding issues, a dedicated dialog will tell the user what went wrong.

Fixes #1957

theGreatWhiteShark commented 2 months ago

This one turns out to be more tricky than I thought.

The solution I cooked up locally worked on my Linux machine (among others replacing archive_entry_set_pathname -> archive_entry_set_pathname_utf8). But it did not worked with libarchive version (3.4.0) used in our Linux build pipeline. In some versions - at least 3.6.2 and 3.7.2 - this even results in a segfault (!!!). This is what makes the macOS pipeline fail. (I'm very glad I wrote an unit test checking drumkit import and export just to be sure)

Windows is a different topic. This has to be fixed aynway and should be tackled as soon as the other things are stable.

It feels like we have to be especially careful in here, delve into libarchive code, and get some feedback from libarchive team. That will probably take a while and requires extensive testing. I'll do the JACK timebase integration test and fix first.

theGreatWhiteShark commented 2 weeks ago

Superseded by #1981 which does not attempt to include UTF-8 support for drumkit exporting.