mamedev / mame

MAME
https://www.mamedev.org/
Other
7.78k stars 1.96k forks source link

chdman createcd doesn't work with input path containing accents on Windows #12095

Open Sincasios opened 4 months ago

Sincasios commented 4 months ago

MAME version

0.263

System information

Windows 11 in spanish

INI configuration details

N/A

Emulated system/software

N/A

Incorrect behaviour

createcd can't work with input paths (-i) with accents.

Create a folder with accents and put the ".iso" or ".bin/.cue" inside, for example: D:\Té

And use the path in the "-i" argument:

>chdman createcd -i "D:\Té\Something.iso" -o "Something.chd"
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0263)
Error parsing input file (D:\T├®\Something.iso: No such file or directory)

Expected behaviour

It should read "D:\Té\Something.iso"

Steps to reproduce

  1. Create a directory with accents, like "Emulación"
  2. Put an iso/bin+cue inside this folder
  3. Try to use createcd adding the folder to the "-i" argument

Additional details

The -o argument works well with accents.

"createdvd" shows strange symbols to the console, but it works (reads and generates the output):

> chdman.exe createdvd -i D:\Té\Something.iso -o D:\Té\Test.chd
chdman - MAME Compressed Hunks of Data (CHD) manager 0.263 (mame0263)
Output CHD:   D:\T├®\Test.chd
Input file:   D:\T├®\Something.iso
Compression:  lzma (LZMA), zlib (Deflate), huff (Huffman), flac (FLAC)
Logical size: 211,288,064
Compressing, 24.5% complete... (ratio=56.4%)
ajrhacker commented 4 months ago

The reason for this problem is that Windows doesn't like UTF-8 for filenames. chdman createdvd works because its implementation uses MAME's osd_file::open, whose Windows version uses actual Windows system calls and translates all filenames to UTF-16. Unfortunately, MAME's CD-ROM parsers mostly use fopen, which is known not to accept UTF-8 filenames on Windows.

invertego commented 4 months ago

mame could opt in to UTF-8 support with a manifest setting just as it currently does for DPI awareness and long path awareness. Currently only the main executable embeds a custom manifest, so the tools would need to be updated to do this as well.

https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activeCodePage

cuavas commented 4 months ago

mame could opt in to UTF-8 support with a manifest setting just as it currently does for DPI awareness and long path awareness. Currently only the main executable embeds a custom manifest, so the tools would need to be updated to do this as well.

https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activeCodePage

That requires a relatively new version of Windows to work (we still support older versions of Windows), and using it breaks the code in util:core_file that’s supposed to use the user’s legacy “ANSI code page”.

invertego commented 4 months ago

That requires a relatively new version of Windows to work (we still support older versions of Windows), and using it breaks the code in util:core_file that’s supposed to use the user’s legacy “ANSI code page”.

What do you expect to break? If the user sets UTF-8 as their system-wide code page, it will have the same effect on MAME as this manifest setting (so the same things will break).

cuavas commented 4 months ago

That requires a relatively new version of Windows to work (we still support older versions of Windows), and using it breaks the code in util:core_file that’s supposed to use the user’s legacy “ANSI code page”.

What do you expect to break? If the user sets UTF-8 as their system-wide code page, it will have the same effect on MAME as this manifest setting (so the same things will break).

Well if the user doesn’t set their system-wide ANSI code page to UTF-8, they shouldn’t be expecting applications to use UTF-8 for things that are nominally supposed to use the system-wide ANSI code page.

But that aside, the code doesn’t work for ANSI code pages where the are more than two possible coding lengths for characters (e.g. GB2312 and Shift-JIS work, GB18030 and UTF-8 don’t).

invertego commented 4 months ago

Well if the user doesn’t set their system-wide ANSI code page to UTF-8, they shouldn’t be expecting applications to use UTF-8 for things that are nominally supposed to use the system-wide ANSI code page.

Ah, if supporting legacy code pages is a goal then I understand why you can't override it.

But that aside, the code doesn’t work for ANSI code pages where the are more than two possible coding lengths for characters (e.g. GB2312 and Shift-JIS work, GB18030 and UTF-8 don’t).

I found the code you are referring to (in osd_uchar_from_osdchar) and yes, unfortunately that's already broken in its current state, but I see there's already a FIXME comment.

Thanks for explaining the reasoning here; guess there's no shot at a free lunch.

crashGG commented 3 months ago

Similarly, chdman currently cannot create files with East Asian character file names, such as files named with Chinese, Japanese, and Korean characters.