lvandeve / lodepng

PNG encoder and decoder in C and C++.
zlib License
2.04k stars 420 forks source link

Add support for non-ASCII filenames under Windows #136

Open Cvelth opened 3 years ago

Cvelth commented 3 years ago

This library uses const char* (std::string for C++ function) as an input type for filenames. This works great under *nix systems, but fails to access files with non-ANSI characters in their paths when used under Windows (MSVC). This is caused by the fact that Windows uses utf-16 internally (instead of *nix's utf-8), characters of which cannot be stored in 8 bytes (char). To fix this issue, wchar_t needs to be used.

So, I implemented overloads for every function accepting filename, which are only available with MSVC (if _MSC_VER is defined in the environment) toolkit under Windows (if _WIN32 is defined), if LODEPNG_NO_COMPILE_DISK is not defined.

These overloads can, additionally be disabled by defining LODEPNG_NO_WIDE_CHAR_OVERLOADS.

I'm open to any suggestions and will try to implement additional fixes to the PR if proposed.

Looking forward to the reply.

P. S. I've additionally took liberty to fix (just run regex-powered auto-replace) an inconsistency in #endif commentaries where some of which were defined as #endif /* LODEPNG_COMPILE_DISK */ even though most of them were formatted as #endif /*LODEPNG_COMPILE_DISK*/.

mmp commented 3 years ago

Nice! Is there any chance of merging this, @lvandeve?

I was about to prepare a PR to support UTF8 encoded filenames on Windows for similar reasons but was going to follow the approach taken in stb_image, where char * filenames are optionally assumed to point to UTF8-encoded strings that are then converted to UTF16 before being passed to the corresponding Windows APIs that take UTF16 for filenames...

(I'm happy to go ahead and do that, if that would be preferable.)

lvandeve commented 3 years ago

Filenames and opening of files are platform dependent: a non-ANSI-C compatible function _wfopen is required for this.

lodepng uses pure ANSI C (except for the optional C++ parts) and tries to be as platform-independent as possible, even avoiding platform dependent #ifdefs where possible to avoid potential breakages (ANSI C is the biggest reliable constant here)

Perhaps it was a mistake to have filename-based functions at all inside of the library, as its main focus is to decode PNG images. It seemed like a good idea because C looks like it has the required functionality, except as it turns out not all platforms are compatible with it.

At this point, I'd rather add a warning to the ###_file functions that they don't work correctly with systems that use 16-bit filename characters, and eventually deprecate those functions, than widen the scope of the file loading for this.

lodepng can be used without the ###_file functions: Windows' API provides better functions for loading files in Windows, and then lodepng can be used to decode from the memory buffer. lodepng doesn't actually do anything efficient with files itself either: it also loads it into a memory buffer and then decodes from that

Would that work?

Cvelth commented 3 years ago

Thanks for your answer.

The thing is, for me personally, merging of this PR is not essential anymore, I used my forked version for a while, after which the project in question was moved to newer Windows version and as are result its code page could be converted to utf-8.

For those who are interested and/or encountered the same problem, you can read about it here. Take notice, it's only possible for applications built for Windows 10 Version 1903 (May 2019 Update) and newer.

P. S. Explicit warning are always a good idea, they will probably save someone painful debugging in the future.

mmp commented 3 years ago

Fair point about the “decode from a buffer” option; that works for me as well. (And I share your surprise that plain old getting a FILE * for a file on disk leads into weird non-portable ugliness!)

On Tue, Jun 15, 2021 at 2:51 PM Cvelth @.***> wrote:

Thanks for your answer.

The thing is, for me personally, merging of this PR is not essential anymore, I used my forked version for a while, after which the project in question was moved to newer Windows version and as are result its code page could be converted into a utf-8.

For those who are interested and/or encountered the same problem, you can read about it here https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page#set-a-process-code-page-to-utf-8. Take notice, it's only possible for applications built for Windows 10 Version 1903 (May 2019 Update) and newer.

P. S. Explicit warning are always a good idea, that could save someone painful debugging in the future.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lvandeve/lodepng/pull/136#issuecomment-861749764, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZBJ5SXNILDRMEY6PFICLTS6OJDANCNFSM4QIT6CCA .