Open Joachim-Otahal opened 2 years ago
I assume Windows is doing some legacy thing here that makes fopen()
not work on the strings that end up in argv
if those contain certain kinds of non-ascii characters. I consider that a bug in Windows more than anything else, but if there's a workaround we can/should implement it. Any dev familiar with Windows peculiarities who can help with that?
For comparison - following command lines work fine in other well known programs: Opens the picture: "c:\Program Files\XnViewMP\xnviewmp.exe" X:\pics\joyreactor\3d-art-красивые-картинки-art-пираты-6497178.jpeg Creating a tiny video: c:\prog\ffmpeg\bin\ffmpeg.exe -i X:\pics\joyreactor\3d-art-красивые-картинки-art-пираты-6497178.jpeg -c:v libx265 c:\tmp\3d-art-красивые-картинки-art-пираты-6497178.mkv Converts successfully: "c:\Program Files\ImageMagick\magick.exe" convert "X:\pics\joyreactor\3d-art-красивые-картинки-art-пираты-6497178.jpeg" -resize 25% c:\tmp\3d-art-красивые-картинки-art-пираты-6497178.jpeg
So the solution must be out there already implemented countless times.
Hmm, for me the Russian names work fine. Maybe it has something to do with the OS.
Imagemagick works too, but I cannot set "effort" ? And does "Quality 100" really mean lossless? Example / documentation missing, or my google-fu is broken. And it may be behind when it comes to your tool regarding bugs.
Hmm, for me the Russian names work fine. Maybe it has something to do with the OS.
You seem to have russian locale installed in your Windows. I only have German/English, whereas imagemagic, ffmpeg and so on can candle it.
You seem to have russian locale installed in your Windows
This is true. My OS installed from the Eng ISO, but after that WU automatically installed the Russian language pack.
I haven't Japanese locale and cmd
can't display the file name correctly, but cjxl
works fine.
The second screenshot is from Windows Terminal.
I haven't Japanese locale and
cmd
can't display the file name correctly, butcjxl
works fine. The second screenshot is from Windows Terminal.
Your left screenshot is just a font problem. Select, for example, Lucida Console or Liberation Mono as font, and your CMD will have Japanese characters. Could you test it with a Russian filename? See the Joyreactor link in the first post as example. Maybe it is a Windows 11 problem on my machine here.... But then: The same is on Server 2022....
is just a font problem
I use Cascadia Code, with Consolas the Japanese text is displayed the same way.
Could you test it with a Russian filename?
I have a Russian locale installed. https://github.com/libjxl/libjxl/issues/683#issuecomment-933402119
To rule out an Server 2022 or Windows 11 issue I installed a plain German Server 2019 test version in Hyper-V (+ run windows updates) to check on that. Server 2019 is equal to Windows 10 1809. It is still the same problem:
The virtual machine is configuration version 8.0, so any Windows 10 1607 (aka Anniversary) or later and any Server 2016 or later will be able to import it. It is not activated, will run the usual 180 days eval time. The Adminpassword is empty, so it will log on automatically. Download size is ~10 GB. https://joumxyzptlk.de/tmp/S2019-JPEGXL-CJXL.7z Tell me when I can remove it from my site :).
What happens if you run chcp 65001
in the terminal first before cjxl/djxl?
No change. Whether Windows 11 or the virtual plain Server 2019:
IMHO it should be possible to solve, and has been solved by others. Imagmagick, ffmpeg, xcopy, xnview and so on have no problems. This comment also means: I can remove that virtual test machine from my webserver since no one bothered to check it there.
use "UNICODE" versions of functions on Windows :
Oh, just that simple? Then I hope this will be incorporated in their own .EXE releases soon!
use "UNICODE" versions of functions on Windows :
- pass -DUNICODE to compiler flags
- wmain : https://docs.microsoft.com/en-us/cpp/c-language/using-wmain?view=msvc-170
- _wfopen : https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170
Oh, just that simple? Then I hope this will be incorporated in their own .EXE releases soon!
Is there a way to do it just with compiler flags (i.e. so filenames are passed to main() and interpreted in fopen() as UTF-8), or do you always have to use the UTF-16 wmain/_wfopen variants to make unicode filenames work in Windows? If it's the latter, it's not that simple, since it means we have to implement UTF-16 wchar variants of the option parsing in cjxl/djxl...
According to Microsoft you can opt into UTF-8 by default by either specifying it via an application manifest or setting the minimum target version to Windows 10 1903, but that won't help users on an earlier version.
Users can also force UTF-8 system wide, which is a big hammer but does solve the issue (Unlike my earlier chcp
suggestion)
Users can also force UTF-8 system wide, which is a big hammer but does solve the issue (Unlike my earlier
chcp
suggestion)
Neither is necessary on the Windows side since Windows Vista. You don't need to change your region, regionale, or codepage - I can simply stay in my german OS and use Chinese, Russian, Korean whatever filenames within the .CMD window, within the same filename - including Farsi with its right-to-left writing style. This part of the code in CJXL seems to be over a decade behind. Though I personally would say: This specific issue has less priority over other issues since there are easy to workarounds in .CMD or .PS1 scripts. Examples of higher priority: Many pictures require that we have to play with -P -g and a huge range of -I values to get better compression than the input .PNG / .GIF instead of cjxl being able to do this automatically with a high effort setting. Or the huge amount of .GIF files, for example from older dilbert strips, which cjxl cannot compress better and blow up by 20% or more, no matter what options you try.
TODO: check if this is can be reproduced with cjxl_ng
.
TODO: check if this is can be reproduced with
cjxl_ng
.
Got a binary link? https://github.com/libjxl/libjxl/releases does not have a cjxl_ng.exe. Trying the latest from https://artifacts.lucaversari.it/libjxl/libjxl/latest/, which contains a cjxl_ng.exe, which throws this error:
Thanks for trying this!
Thanks for trying this!
If course I try. My current workaround in my powershell script is to create a hardlink in the filesysten with non-unicode name, encode that, and rename the result like the original filename (with .jxl of course).
Just for information: I've been using my powershell script to convert several 100'000 pictures to .JXL, including a workaround for this unicode problem. If there is interest I'll clean it up a bit and put it in my github repo.
I don't recommend you to promote the use of JPEG XL while it isn't stable yet, people is degrading their "lossless" transcodes by signaling degraded dequantization coefficients. See here. Work is underway I think. 0c60630d16fcee56c6d8b306cd95cd4ecf0edc7d
I've already transcoded my entire collection of photos. There's no degradation, it's completely reversible. JXL is very convenient JPG packer.
@Dogway From the link you provide I see the problem with your source images. In my tests I never had such an issue, bit-comparison showed no difference when opened in either XnVviewMP or Affinity photo with subtracting the original from the .jxl. Your issue might be dependent on other factors . The way the original .jpg are encoded in your case(s), small library differences depending on the underlying OS and so on. Though that issue has nothing to do with the unicode-filename problem discussed here.
I was only replying to:
Just for information: I've been using my powershell script to convert several 100'000 pictures to .JXL, (...). If there is interest I'll clean it up a bit and put it in my github repo.
And advising against promoting a software in beta that could damage people photos, if you add a warning, that's fine.
Thanks for trying this!
Any news on this, or a new build to test? Today's build from https://artifacts.lucaversari.it/libjxl/libjxl/latest/ still cannot handle unicode.
Just for information: I've been using my powershell script to convert several 100'000 pictures to .JXL, including a workaround for this unicode problem. If there is interest I'll clean it up a bit and put it in my github repo.
@Joachim-Otahal Yes please, I'd appreciate this. Been looking for a script like this.
Just for information: I've been using my powershell script to convert several 100'000 pictures to .JXL, including a workaround for this unicode problem. If there is interest I'll clean it up a bit and put it in my github repo.
@Joachim-Otahal Yes please, I'd appreciate this. Been looking for a script like this.
Working on it, making it "version 0.1" release ready. Currently adding, as v 0.2, a multi-thread variant with a maximum given number of threads. The multi-thread version is less verbose by design as the single thread version, but faster since everyone has more than one core. But I need a few days of testing before I put that online :D.
Here are two scripts, meant to drag and drop one or more directories on it, and it will start the conversion. They will DELETE the source if the ..JXL result is smaller and the conversion was without errors. Set $WhatIf to $true in the script if you don't trust my code. They include a workaround for the unicode issue, so mass converting Chinese/Korean/Persian/Russian/Japanese etc file names are not a problem any more. Default setting is "lossless with effort 8". Be so kind to report errors, but please do a good error report :D. https://github.com/Joachim-Otahal/PowerShell/tree/main/Jpeg-XL
This issue got a little sidetracked, but to return to what TheDecryptor said, here's a summary of the problem
Here's a tentative patch that adds a manifest. I have tested the {c,d}jxl-x64-windows-static binaries from building with this, and can confirm they fix this issue for me.
diff --git a/cmake/windows_utf8.manifest b/cmake/windows_utf8.manifest
new file mode 100644
index 00000000..dab929e1
--- /dev/null
+++ b/cmake/windows_utf8.manifest
@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
+ <application>
+ <windowsSettings>
+ <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
+ </windowsSettings>
+ </application>
+</assembly>
diff --git a/tools/CMakeLists.txt b/tools/CMakeLists.txt
index be1352cc..dd7d4ee6 100644
--- a/tools/CMakeLists.txt
+++ b/tools/CMakeLists.txt
@@ -332,6 +332,11 @@ foreach(BINARY IN LISTS TOOL_BINARIES)
endif()
set_target_properties(${BINARY} PROPERTIES LINK_FLAGS "${JXL_WASM_TOOLS_LINK_FLAGS}")
endif()
+
+ # Attach manifest that tells Windows to use UTF-8 for eg. fopen
+ if(MSVC)
+ target_sources(${BINARY} PRIVATE "${CMAKE_CURRENT_LIST_DIR}/../cmake/windows_utf8.manifest")
+ endif()
endforeach()
install(TARGETS ${TOOL_BINARIES} RUNTIME DESTINATION "${CMAKE_INSTALL_BINDIR}")
It adds a cmake/windows_utf8.manifest
file (taken from the Microsoft docs linked above), then uses cmake's target_sources
to add that .manifest to all the tool binaries. Cmake knows how "to honor *.manifest source files with MSVC tools. Manifest files named as sources of .exe and .dll targets will be merged with linker-generated manifests and embedded in the binary."
Caveats are:
edit: Here's a build artifact if anyone else wants to test it.
Here's another workaround, just using a short 8.3 file name. Short file names contain only MS-DOS compatible characters. This works also with other programs that are not compatible with Windows unicode.
cjxl.exe %~s1 temp.jxl %2 %3 %4 %5 %6 %7 %8 %9
move temp.jxl "%~dpn1.jxl"
For more info about these macros see for /?
This trick doesn't work on exFAT filesystems though. exFAT seems not to support short file names anymore.
imho, you should use UTF-16 to handle non alphabetic languages on Windows
note that for static string, L"foo" will be a wchar_t *, so just define L to nothing on unix
Here's a tentative patch that adds a manifest. I have tested the {c,d}jxl-x64-windows-static binaries from building with this, and can confirm they fix this issue for me.
This is definitely the way to go forward, but needs to be further reworked a little bit - using .manifest file directly as source only works for MSVC toolchain as you already pointed out, while the MinGW toolchain does not understand it. Instead, one just needs to wrap this .manifest in a .rc file, and then both MSVC and MinGW can deal with it.
See e.g. how libavif did it recently, inspired by RStudio.
imho, you should use UTF-16 to handle non alphabetic languages on Windows
This was the "old" way of doing it, UTF-8 should be preferred now as it is fully cross-platform and less code.
@kmilos Oh, great! So that gives a patch like this?
diff --git a/tools/CMakeLists.txt b/tools/CMakeLists.txt
index be1352cc..3e3df67a 100644
--- a/tools/CMakeLists.txt
+++ b/tools/CMakeLists.txt
@@ -332,6 +332,23 @@ foreach(BINARY IN LISTS TOOL_BINARIES)
endif()
set_target_properties(${BINARY} PROPERTIES LINK_FLAGS "${JXL_WASM_TOOLS_LINK_FLAGS}")
endif()
+
+ # Attach manifest that tells Windows to use UTF-8 for eg. fopen
+ if(WIN32)
+ # MinGW doesn't have a manifest tool (mt.exe), so we need to wrap
+ # utf8.manifest in a resource-definition script (.rc file).
+ target_sources(${BINARY} PRIVATE windows_utf8.rc)
+
+ # The MSVC linker (link.exe) fails with the following error if we
+ # don't use the /MANIFEST:NO linker option:
+ # /MANIFEST:EMBED,ID=1" failed (exit code 1123) with the following
+ # output:
+ # CVTRES : fatal error CVT1100: duplicate resource. type:MANIFEST,
+ # name:1, language:0x0409
+ if(NOT MINGW)
+ target_link_options(${BINARY} PRIVATE /MANIFEST:NO)
+ endif()
+ endif()
endforeach()
install(TARGETS ${TOOL_BINARIES} RUNTIME DESTINATION "${CMAKE_INSTALL_BINDIR}")
diff --git a/tools/windows_utf8.manifest b/tools/windows_utf8.manifest
new file mode 100644
index 00000000..dab929e1
--- /dev/null
+++ b/tools/windows_utf8.manifest
@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
+ <application>
+ <windowsSettings>
+ <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
+ </windowsSettings>
+ </application>
+</assembly>
diff --git a/tools/windows_utf8.rc b/tools/windows_utf8.rc
new file mode 100644
index 00000000..411efb7e
--- /dev/null
+++ b/tools/windows_utf8.rc
@@ -0,0 +1,3 @@
+#include <winuser.h>
+
+CREATEPROCESS_MANIFEST_RESOURCE_ID RT_MANIFEST "windows_utf8.manifest"
Can confirm this still works when building with MSVC. Can anybody easily test the MSYS2 binaries?
edit: Should it be if(NOT MINGW)
or if(MSVC)
?
Something like that, great! No time to check on MSYS2 yet though...
Should it be
if(NOT MINGW)
orif(MSVC)
?
Since you're inside the if(WIN32)
I don't think it matters, but maybe if(MSVC)
is clearer - note that MSVC
is also defined when compiling w/ clang-cl
.
- The user can set this globally for the whole system with the "Beta: Use Unicode UTF-8 for worldwide language support" setting in the Control Panel.
I tested this method and it is unacceptable for practical use, because then in localized programs the font is replaced with strange characters.
I tested this method and it is unacceptable for practical use
Luckily it's not needed to change the system-wide locale w/ the above patch.
Okay, that won't build with mingw. I get
[399/443] Building RC object tools/CMakeFiles/jxlinfo.dir/windows_utf8.rc.obj
FAILED: tools/CMakeFiles/jxlinfo.dir/windows_utf8.rc.obj
D:\a\_temp\msys64\mingw64\bin\windres.exe
-O coff
-DFJXL_ENABLE_AVX512=0
-DHWY_DISABLED_TARGETS="(HWY_SSSE3|HWY_AVX3|HWY_AVX3_SPR|HWY_AVX3_ZEN4|HWY_SVE|HWY_SVE2|HWY_SVE_256|HWY_SVE2_128|HWY_RVV)"
-D__DATE__=\"redacted\"
-D__TIMESTAMP__=\"redacted\"
-D__TIME__=\"redacted\"
-I D:/a/libjxl/libjxl
-I D:/a/libjxl/libjxl/third_party/highway
-I D:/a/libjxl/libjxl/build/lib/include
D:/a/libjxl/libjxl/tools/windows_utf8.rc
tools/CMakeFiles/jxlinfo.dir/windows_utf8.rc.obj
'HWY_AVX3' is not recognized as an internal or external command, operable program or batch file.
https://github.com/scurest/libjxl/actions/runs/8358808655/job/22880878058?pr=1#step:7:421
My best guess is the quotes in -DHWY_DISABLED_TARGETS="
should be escaped, like for -D__DATE__
, etc. Not sure why they aren't. Possibly related: https://stackoverflow.com/questions/45248034/cmake-generated-mingw-makefile-has-quoting-errors
that won't build with mingw
Confirmed the same error locally - MSYS2 UCRT64 bash shell, Ninja generator...
I guess windres
is more fussy than other frontends...
I am using W7 jap locale and it can handle japanese but not chinese,russian or korean or the characters " ! " and " ' ",havent tried others, just sharing my case.
Try the simple trick with short file names as I mentionem above, this should work in Windows 7.
@jendalinda It changes the outputs file's name, so it doesnt really fix. I just delete the characters that cause the error if I really want to convert it. And without putting short names in the output name path expansion it just cant write to the file.
@Theo1996
The file must be renamed after the conversions, taht's what move
does.
@jendalinda I understand that.
@scurest Have you tried also including
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9b74537..e8e19dd 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -300,7 +300,7 @@ else ()
if (NOT JPEGXL_ENABLE_SIZELESS_VECTORS)
set(HWY_DISABLED_TARGETS "${HWY_DISABLED_TARGETS}|HWY_SVE|HWY_SVE2|HWY_SVE_256|HWY_SVE2_128|HWY_RVV")
endif()
- add_definitions(-DHWY_DISABLED_TARGETS=\(${HWY_DISABLED_TARGETS}\))
+ add_definitions(-DHWY_DISABLED_TARGETS="\(${HWY_DISABLED_TARGETS}\)")
endif()
# Machine flags.
Edit: Even better, drop the deprecated add_definitions
and use add_compile_options
w/ a generator that skips it for RC language, something like
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 9b74537..1eee58e 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -300,7 +300,7 @@ else ()
if (NOT JPEGXL_ENABLE_SIZELESS_VECTORS)
set(HWY_DISABLED_TARGETS "${HWY_DISABLED_TARGETS}|HWY_SVE|HWY_SVE2|HWY_SVE_256|HWY_SVE2_128|HWY_RVV")
endif()
- add_definitions(-DHWY_DISABLED_TARGETS=\(${HWY_DISABLED_TARGETS}\))
+ add_compile_options($<$<NOT:$<COMPILE_LANGUAGE:RC>>:-DHWY_DISABLED_TARGETS=\(${HWY_DISABLED_TARGETS}\)>)
endif()
# Machine flags.
@kmilos It builds, I don't know if it works. Would be great if someone who uses MinGW or Windows can test it.
@scurest That second approach w/ generator confirmed building and working in MSYS2 as well:
C:\msys64\opt\libjxl\bin>cjxl фото-Природа-озеро-горы-6946340.jpeg фото-Природа-озеро-горы-6946340.jxl -d 0 -e 8
JPEG XL encoder v0.10.2 aacecd24 [AVX2,SSE4,SSE2]
Note: Implicit-default for JPEG is lossless-transcoding. To silence this message, set --lossless_jpeg=(1|0).
Encoding [JPEG, lossless transcode, effort: 8]
Compressed to 2544.0 kB including container
Wohoo, can't wait until the next release, so I can kick my ugly workaround...
@scurest Thanks for looking into this! Could you make a PR if you have something that fixes the problem?
I have a log of pictures with russian, japanese and chinese filenames. Examples: 3d-art-красивые-картинки-art-пираты-6497178.jpeg or фото-Природа-озеро-горы-6946340.jpeg When I rename them using the CMD shell: ren фото-Природа-озеро-горы-6946340.jpeg 00000temp.jpeg I can convert them and rename it back afterwards. See attached screenshot. It does not happen für typical WESTERN unicode filenames with German Umlaut like ÖÄÜ.
That exact source picture on the internet: http://joyreactor.cc/post/4943227
Cannot test that for *nix, don't have a linux at home any more (last was about 2015). But since the normale cmd copy / xcopy / robocopy and all other windows command line tools can handle it it should be possible for the windows version of CJXL too.
kind regards,
Joachim Otahal