paketo-buildpacks / native-image

A Cloud Native Buildpack that creates native images from Java applications
Apache License 2.0
52 stars 9 forks source link

Ignoring locale settings and sun.jnu.encoding #344

Open Schaka opened 1 month ago

Schaka commented 1 month ago

I hope I'm in the right place and this isn't directly related to GraalVM. So please excuse me if I'm wasting yourn time. You can find all the code I'm talking about right here: https://github.com/Schaka/janitorr/tree/bazarr-support

The image is built using the Spring-Boot bootImage step via Gradle and I'm passing these ENV variables.

"BPE_DEFAULT_LANG" to "en_US.UTF-8",
"BPE_LANG" to "en_US.UTF-8",
"BPE_LC_ALL" to "en_US.UTF-8",
"JAVA_TOOL_OPTIONS" to """
    -Dsun.jnu.encoding=UTF-8
    -Dfile.encoding=UTF-8
""".trimIndent(),
"BP_NATIVE_IMAGE_BUILD_ARGUMENTS" to """
    -march=compatibility
    -H:+AddAllCharsets
    -Dsun.jnu.encoding=UTF-8
    -Dfile.encoding=UTF-8
""".trimIndent()

My host (Debian 12) has LANG set correctly and LC_ALL not set at all. According to the docs, I also passed these arguments to Docker via compose.yml

According to the docs, this would not print correctly to console (docker logs) otherwise, but definitely seems to. Granted, I use logback and not any direct prints, so there is a chance this fixes things magically.

services:
  janitorr:
    container_name: janitorr
    image: ghcr.io/schaka/janitorr:native-amd64-80-merge
    user: 1000:1000
    ports:
      - 8978:8978 # Technically, we don't publish any endpoints, so this isn't strictly required
    volumes:
      - /appdata/janitorr/config/application.yml:/workspace/application.yml
      - /share_media:/data
    environment:
      - LC_ALL=en_US.UTF-8
      - LANG=en_US.UTF-8

Yet, the second I use Path.of("a path with an ümläüt"), I run into the following exception:

java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /data/media/anime-movies/Nausicaä of the Valley of the Wind (1984) [imdbid-tt0087544]/Nausicaä of the Valley of the Wind (1984) [imdbid-tt0087544] - [Bluray-1080p][FLAC 2.0][x264].mkv
    at java.base@23/sun.nio.fs.UnixPath.encode(UnixPath.java:131) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
    at java.base@23/sun.nio.fs.UnixPath.<init>(UnixPath.java:77) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
    at java.base@23/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:312) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
    at java.base@23/java.nio.file.Path.of(Path.java:148) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
    at com.github.schaka.janitorr.mediaserver.AbstractMediaServerService.pathStructure$janitorr(AbstractMediaServerService.kt:71) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]
    at com.github.schaka.janitorr.mediaserver.AbstractMediaServerService.createLinks(AbstractMediaServerService.kt:99) ~[com.github.schaka.janitorr.JanitorrApplicationKt:na]

Is there something I'm missing here, or could this be a bug in GraalVM somehow? Looking at the code, UnixFileSystem definitely reads sun.jnu.encoding. The filepath is received as a valid UTF-8 string via REST.

Logging from within the image provides:

2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : Default charset UTF-8
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : sun.jnu.encoding ANSI_X3.4-1968
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : sun.stdout.encoding null
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : sun.stderr.encoding null
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : ENV JAVA_TOOL_OPTIONS null
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : ENV LANG en_US.UTF-8
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : ENV LANGUAGE null
2024-10-14T09:56:37.360Z  INFO 1 --- [           main] c.g.s.j.config.RuntimeEnvironment        : ENV LC_ALL en_US.UTF-8
dmikusa commented 1 month ago

Did you try with -J flag to native-image?

-J pass directly to the JVM running the image generator

So if you're setting additional args like ...

"BP_NATIVE_IMAGE_BUILD_ARGUMENTS" to """ -march=compatibility -H:+AddAllCharsets -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8

Then try -J-Dsun.jnu.encoding=UTF-8 and -J-Dfile.encoding=UTF-8 (instead of those args without the -J). That should pass those args through to the build time. I see some evidence of that helping similar issues here.

In general, what I suggest with something like this is to get it working without buildpacks. So make it build and work when calling native-image directly (or with their gradle tools) on your machine. When it's working well that way, look at the flags you had to pass to native-image and then update BP_NATIVE_IMAGE_BUILD_ARGUMENTS according.

The buildpack is just installing & running native-image for you. It attempts to add some basic arguments that you'll need, but beyond that, it's up to you to pass additional arguments through.

Schaka commented 1 month ago

I did test it without buildpacks about an hour ago and made a small sample project. I was about to update the issue and figured it may be buildpack related. I've also considered it may be a problem with Adoptium or Bellsoft.

Here's a small test project: graalvm-test.zip

If you replace the base image in the Dockerfile with 21 instead of 23, you can start it with any combination of parameters. They end up not mattering and you always get the same error. With 23 it just works and you don't have to set encoding parameters at all.

I've given the -J flag a try and had no success.

I've managed to get `-Dsun.jnu.encoding=UTF-8" and read it from within the image as such by adjusting the CMD (according to docs) but the actual value seemed to get completely ignored.

dmikusa commented 1 month ago

This doesn't seem like something that's specific to buildpacks, if I'm following your tests here. If you can reproduce it using the standard Dockerfiles, that's a behavior with native-image itself.

All I can suggest is that we do have Java 23 available in buildpacks, https://github.com/paketo-buildpacks/bellsoft-liberica/releases/tag/v10.9.0, has it Java 23, and that was pulled into https://github.com/paketo-buildpacks/java/releases/tag/v16.1.0 last week's release. So if using Java 23 works with your Dockerfile sample, I'd bet it works with buildpacks too.

Schaka commented 1 month ago

So I had already been using Java 23 for a while. This is my log output in that regard:

$JAVA_TOOL_OPTIONS                                                                        the JVM launch flags
    [creator]         Using Java version 23 from BP_JVM_VERSION
    [creator]       BellSoft Liberica NIK 23.0.0: Contributing to layer

I now added "paketobuildpacks/oracle". I was hoping there was a way to use Oracle's GraalVM as a base image directly and this seems to be it.

Unfortunately, the result is still the same. The resulting image cannot use Path.of("/umläut") without an exception being thrown.

If I can recreate it in a sample project specifically built around paketo's buildpacks, will you look into it?

dmikusa commented 1 month ago

What we'd need to look into this is something like your sample project from before that works when you build with a Dockerfile or just on the local machine, but does not work when building the same source code with buildpacks. If you have a sample like that, I can take a look.

Schaka commented 1 month ago

native-image-error.zip

If you do gradle bootBuildImage and docker run native-image-error, you'll see the error. It will not error if you just run it locally, only in the image produced by buildpacks.

You can even try docker run native-image-error -Dsun.jnu.encoding=UTF-8. I think it's some form of base image that doesn't accept or delegate the LANG or LC_ALL env variables or the base image is one where GraalVM doesn't interpret them correctly at build time.

As you can see in my previous example, it works just fine using Oracle's base image with javac and native-image. I'm at a bit of a loss trying to figure out what's different.

You guys do amazing work, so I wouldn't be surprised if I fucked up somehow.

dmikusa commented 1 month ago

A couple of observations...

Schaka commented 1 month ago

I've also created an issue over at at GraalVM. The package they use seems to be glibc-all-langpacks.

I can't find the Ubuntu/Debian equivalent. Maybe they can shine a light on it.

The locales package is available on Ubuntu 22, but that should be installed by default.

Having a quick look, at most I can tell that Oracle Linux 9 supplies glibc-all-langpacks 2.34, whereas Ubuntu's locale uses 2.35. Hopefully that one minor version isn't what breaks it here.

Edit: Out of curiosity, I've created a builder from a build and run image using graalvm-community:23, specifically added the glibc-all-langpacks and it still results in the same error. I'm guessing I'd have to adjust the buildpacks too.

Schaka commented 1 month ago

I ended up using --patch-module. Recompiling the sun.nio.fs.Util class myself with UTF-8 forced, doing a hacky copying into a folder structure that's accepted by the compiler and it got accepted by the native image without issue.

native-image-error.zip

It's incredibly hacky but may solve this for anyone else that needs a temporary (haha) solution. The real problem still needs fixing but I suspect it'll be a while before the GraalVM team gives a definitive answer as to what it actually required and the system property is set under the hood.

dmikusa commented 1 month ago

Thanks for sharing! I'm subscribed on the GraalVM issue, so I'll keep an eye on what they say and if there's anything we can do to make this just work, I'll open up issues.