timob / jnigi

Golang Java JNI library
BSD 2-Clause "Simplified" License
163 stars 44 forks source link

Loading JVM from a path containing unicode characters #74

Closed mlaggner closed 1 year ago

mlaggner commented 1 year ago

I saw that loading the JVM from a path containing an unicode character (e.g. Séries) fails. If I change the path name to use only ASCII character, loading succeeds.

As far as I found out, the Windows API LoadLibrary has two different flavours which are used on compile time: LoadLibraryA (ASCII variant - default) and LoadLibraryW (unicode variant).

After researching for a few more hours I say that I need to set a compiler variable tho force the usage of the unicode variant which I did via (windows.go):

/*
#include <jni.h>

// we need to compile with unicode support on Windows
#ifndef UNICODE
#define UNICODE
#endif

#ifndef _UNICODE
#define _UNICODE
#endif

#include <windows.h>

which leads to some compiler errors. After fiddling a bit more with the parameters for LoadLibrary (which does not accept a C.char in the unicode variant) I got it compiling again, but the result did not work either.

Did I miss anything else or am I simply wrong?

mlaggner commented 1 year ago

I made some more changes and the JVM loads now - but I cannot start the main method (could not start main method - Java exception occured. check stderr/logcat). My current approach is: https://github.com/mlaggner/jnigi/blob/master/windows.go

looks like the call to main needs to be adopted to the Windows C.wstring_t ?

timob commented 1 year ago

Could you use https://pkg.go.dev/golang.org/x/sys@v0.11.0/windows#LoadLibrary instead of existing call? This does seem to use LoadLibraryW, so another approach would be to call that directly.

looks like the call to main needs to be adopted to the Windows C.wstring_t ?

not sure what you mean by that.

mlaggner commented 1 year ago

Thanks for the hint with windows.LoadLibrary, but this leads to problems in the following calls (the result of the call windows.LoadLibrary is a different type than in your code). Unfortunately I have no clue what I am doing here (I am a Java dev and no C/C++ dev) and I do not have a Windows development environment either...

Do you have some more suggestions how to change that?

timob commented 1 year ago

Ok i've got a solution here:

https://github.com/timob/jnigi/tree/windows_unicode_dll_path_fix

Hope that works for you, working for me.

mlaggner commented 1 year ago

Thanks - now I get until loading of the JVM, but the JVM is throwing some exception (which is not logged anywhere):

"could not start main method - Java exception occured."

I need to review the code to get a clue what is failing.

BTW: without the unicode character the JVM is starting fine with your changes! 👍

mlaggner commented 1 year ago

could this be a problem? https://github.com/timob/jnigi/blob/master/cinit.go#L34

the JVM args contain the classpath (-Djava.class.path) which also contains unicode characters. I could also imagine that JVM params (and/or app arguments) contain unicode characters

timob commented 1 year ago

The JNI functions use UTF-8 strings same as Go so the code you linked should not be a problem.

Googling there do seem to be problems on Windows using unicode characters in class paths with OpenJDK in general.

mlaggner commented 1 year ago

I just tried to call javaw with the same path and parameters and this works...

jre\bin\javaw.exe -classpath C:\Séries\main.jar;C:\Séries\lib\*;C:\Séries\addons\* -Xms64m -Xmx512m -Xss512k -XX:+IgnoreUnrecognizedVMOptions -XX:+UseG1GC -XX:+UseStringDeduplication -Dsun.java2d.renderer=sun.java2d.marlin.MarlinRenderingEngine -splash:splashscreen.png -Djava.net.preferIPv4Stack=true -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 -Djna.nosys=true com.Main
timob commented 1 year ago

Looking at https://stackoverflow.com/questions/20052455/jni-start-jvm-with-unicode-support . It seems the command line utilities like javaw are doing the encoding somehow, but this is not done during JNI invocation.

The suggestion from that stackoverflow is to call System.setProperty with the class path, after you create the VM.

timob commented 1 year ago

The suggestion from that stackoverflow is to call System.setProperty with the class path, after you create the VM.

Thats wrong, i mean you can set the property but i don't think it effects where classes are found after the JVM is started. There are ways of setting the class path dynamically.

mlaggner commented 1 year ago

I had a look at the implementation of the java.exe inside the OpenJDK: https://github.com/openjdk/jdk17/blob/master/src/java.base/share/native/libjli/java.c#L1518

looks like they're passing the JVM args directly to CreateJavaVM. I could not find out which data types are used there (yet)

mlaggner commented 1 year ago

now I just found that: https://github.com/openjdk/jdk17/blob/master/src/java.base/windows/native/libjli/cmdtoargs.c#L86

this looks like the java CLI executable is converting the JVM args to another format - am I right?

timob commented 1 year ago

Good news and bad news. Good news is that I've got your example working. Bad news is that looks like the JVM on Windows expects arguments to be encoded in the system code page not UTF-8 (thanks for pointing to the code above), so you are limited to that character set. Usually for latin languages: Windows-1252.

So if you prepare the arguments like this:

import  "golang.org/x/text/encoding/charmap"
...
winEnc := charmap.Windows1250.NewEncoder()
winStr, err := winEnc.String(arg)
if err != nil {
    panic("charmap.Encoder errror: " + err.Error())
}
...
jnigi.CreateJVM(jnigi.NewJVMInitArgs(false, true, jnigi.DEFAULT_VERSION, []string{winStr}))

your example with "Séries" in it will work.

I think it's probably beyond the scope of JNIGI to detect the current code page Windows is using and then do the encoding.

mlaggner commented 1 year ago

many thanks for your hints so far. I will do some tests over the weekend

myron0815 commented 1 year ago

Hi, second dev here. Thanks for the pointer on codepage!

I've looked on my Windows instance with CHCP, and in returns 65001 - which is UTF8. I've set this (still beta?) feature like here: https://superuser.com/a/1435645 Done that mainly to get all the correct chars in windows console - although said to be sometimes problematic, i've found no issues so far (i even forgot having set that)

That being said, current app starts w/o any problems here, having some unicode chars in path!

If we fixate that now to one of the many (https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers), will it produce more problems for others, than it solves? Eg will that break my UTF8 installation now? (Will test if mlaggner has a build ready)

mlaggner commented 1 year ago

@timob thanks for your hints! combining your approach with the results from @myron0815 lead me to the following doc https://github.com/MicrosoftDocs/windows-dev-docs/blob/docs/hub/apps/design/globalizing/use-utf8-code-page.md

According to the document from Microsoft we're able to set the manifest entry:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>

which will force UTF-8 mode for the whole application using Windows Version 1903 (May 2019 Update) or newer. We've tested this on a Windows machine using Windows-1252 and on a machine with UTF-8 and both worked as expected.

So from our point of view, we will not need to encode the classpath (and arguments), we rather set the manifest values and will force our users to use a Windows Version >= 1903 or do not use unicode characters in their classpath.

Your help is really appreciated (you are the hero of the day for us :D)

timob commented 1 year ago

Hey nice! Thats interesting, around how UTF-8 works on Windows. I'm recently back developing on Windows so I'm learning along the way.