sgerrand / alpine-pkg-glibc

A glibc compatibility layer package for Alpine Linux
2.05k stars 280 forks source link

UTF-8 locale for glibc-dependent applications #5

Closed frol closed 8 years ago

frol commented 8 years ago

Your artifacts do not include any UTF-8 locale, which results in printing question marks instead of non-latin characters from applications relying on LC_* information, and this can even prevent code compilation in case of OracleJDK. There is an initiative to implement C.UTF-8 locale. While it is not in glibc upstream, I have succeeded in building a custom C.UTF-8 locale by copying a minimum required set of files from my Arch Linux:

# tree /usr/glibc-compat/share/i18n/
├── charmaps
│   └── UTF-8.gz
└── locales
    └── POSIX

Once the files are in place, I can generate the locale:

# /usr/glibc-compat/bin/localedef --force --inputfile POSIX --charmap UTF-8 C.UTF-8
LC_MONETARY: value of field `int_curr_symbol' has wrong length
No definition for LC_PAPER category found
No definition for LC_NAME category found
No definition for LC_ADDRESS category found
No definition for LC_TELEPHONE category found
No definition for LC_MEASUREMENT category found
No definition for LC_IDENTIFICATION category found

NOTE: I had to force localedef, but it seems to be working fine after that anyway.

Reproduction:

$ echo -e 'public class Main { public static void main(String[] args) { System.out.println("Ф"); } }' > Main.java
$ docker run --rm --volume "$(pwd)":/mnt --workdir /mnt frolvlad/alpine-oraclejdk8:slim sh -c 'javac Main.java && java Main'
Main.java:1: error: unmappable character for encoding ASCII
public class Main { public static void main(String[] args) { System.out.println("??"); } }
                                                                                 ^
Main.java:1: error: unmappable character for encoding ASCII
public class Main { public static void main(String[] args) { System.out.println("??"); } }
                                                                                  ^
2 errors

Using the following Dockerfile (look at the last 3 commands), the issue gets fixed:

FROM alpine:3.3

RUN apk add --no-cache --virtual=build-dependencies wget ca-certificates && \
    export ALPINE_GLIBC_BASE_URL="https://circle-artifacts.com/gh/andyshinn/alpine-pkg-glibc/21/artifacts/0/home/ubuntu/alpine-pkg-glibc/packages/builder/x86_64" && \
    export ALPINE_GLIBC_PACKAGE="glibc-2.22-r5.apk" && \
    export ALPINE_GLIBC_BIN_PACKAGE="glibc-bin-2.22-r5.apk" && \
    wget "$ALPINE_GLIBC_BASE_URL/$ALPINE_GLIBC_PACKAGE" "$ALPINE_GLIBC_BASE_URL/$ALPINE_GLIBC_BIN_PACKAGE" && \
    apk add --no-cache --allow-untrusted "$ALPINE_GLIBC_PACKAGE" "$ALPINE_GLIBC_BIN_PACKAGE" && \
    echo 'hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4' >> /etc/nsswitch.conf && \
    apk del build-dependencies && \
    rm "$ALPINE_GLIBC_PACKAGE" "$ALPINE_GLIBC_BIN_PACKAGE"

ENV JAVA_VERSION=8 \
    JAVA_UPDATE=72 \
    JAVA_BUILD=15 \
    JAVA_HOME="/usr/lib/jvm/default-jvm"

RUN apk add --no-cache --virtual=build-dependencies wget ca-certificates && \
    cd "/tmp" && \
    wget --header "Cookie: oraclelicense=accept-securebackup-cookie;" \
        "http://download.oracle.com/otn-pub/java/jdk/${JAVA_VERSION}u${JAVA_UPDATE}-b${JAVA_BUILD}/jdk-${JAVA_VERSION}u${JAVA_UPDATE}-linux-x64.tar.gz" && \
    tar -xzf "jdk-${JAVA_VERSION}u${JAVA_UPDATE}-linux-x64.tar.gz" && \
    mkdir -p "/usr/lib/jvm" && \
    mv "/tmp/jdk1.${JAVA_VERSION}.0_${JAVA_UPDATE}" "/usr/lib/jvm/java-${JAVA_VERSION}-oracle" && \
    ln -s "java-${JAVA_VERSION}-oracle" "$JAVA_HOME" && \
    ln -s "$JAVA_HOME/bin/"* "/usr/bin/" && \
    rm -rf "$JAVA_HOME/"*src.zip && \
    rm -rf "$JAVA_HOME/lib/missioncontrol" \
           "$JAVA_HOME/lib/visualvm" \
           "$JAVA_HOME/lib/"*javafx* \
           "$JAVA_HOME/jre/lib/plugin.jar" \
           "$JAVA_HOME/jre/lib/ext/jfxrt.jar" \
           "$JAVA_HOME/jre/bin/javaws" \
           "$JAVA_HOME/jre/lib/javaws.jar" \
           "$JAVA_HOME/jre/lib/desktop" \
           "$JAVA_HOME/jre/plugin" \
           "$JAVA_HOME/jre/lib/"deploy* \
           "$JAVA_HOME/jre/lib/"*javafx* \
           "$JAVA_HOME/jre/lib/"*jfx* \
           "$JAVA_HOME/jre/lib/amd64/libdecora_sse.so" \
           "$JAVA_HOME/jre/lib/amd64/"libprism_*.so \
           "$JAVA_HOME/jre/lib/amd64/libfxplugins.so" \
           "$JAVA_HOME/jre/lib/amd64/libglass.so" \
           "$JAVA_HOME/jre/lib/amd64/libgstreamer-lite.so" \
           "$JAVA_HOME/jre/lib/amd64/"libjavafx*.so \
           "$JAVA_HOME/jre/lib/amd64/"libjfx*.so && \
    rm -rf "$JAVA_HOME/jre/bin/jjs" \
           "$JAVA_HOME/jre/bin/keytool" \
           "$JAVA_HOME/jre/bin/orbd" \
           "$JAVA_HOME/jre/bin/pack200" \
           "$JAVA_HOME/jre/bin/policytool" \
           "$JAVA_HOME/jre/bin/rmid" \
           "$JAVA_HOME/jre/bin/rmiregistry" \
           "$JAVA_HOME/jre/bin/servertool" \
           "$JAVA_HOME/jre/bin/tnameserv" \
           "$JAVA_HOME/jre/bin/unpack200" \
           "$JAVA_HOME/jre/lib/ext/nashorn.jar" \
           "$JAVA_HOME/jre/lib/jfr.jar" \
           "$JAVA_HOME/jre/lib/jfr" \
           "$JAVA_HOME/jre/lib/oblique-fonts" && \
    apk del build-dependencies && \
    rm "/tmp/"*

# Charset changes are here:
# =========================

COPY ./i18n /usr/glibc-compat/share/i18n

RUN mkdir /usr/glibc-compat/lib/locale && \
    /usr/glibc-compat/bin/localedef --force --inputfile POSIX --charmap UTF-8 C.UTF-8 || true

ENV LANG=C.UTF-8

Fixed version output:

$ docker run --rm --volume "$(pwd)":/mnt --workdir /mnt oraclejdk8-with-C-UTF8-locale sh -c 'javac Main.java && java Main'
Ф
andyshinn commented 8 years ago

I'm pretty sure I was stripping out the i18n and locale stuff because it was so large (https://github.com/andyshinn/alpine-pkg-glibc/blob/master/APKBUILD#L25). So, I think the easy fix for this is to just have another sub-package (glibc-locale or glibc-i18n) that has the /usr/glibc-compat/share/i18n bits. Does that solve this issue?

frol commented 8 years ago

I didn't ask for i18n. POSIX and UTF-8 files take less than 1MB. However, having a package with all locales might also be a good idea, but I would still prefer to have C.UTF-8 in the basic packaging.

andyshinn commented 8 years ago

Isn't the i18n stuff a requirement for generating locales? Or are you suggesting to generate them and package them so they can be installed without needing to be generated?

There are a couple paths to take that I can see:

I'm wondering which might be most flexible. Which are you preferring?

frol commented 8 years ago

Oh, I'm sorry. There is a file i18n in i18n/locales/ folder, so I was just thinking about that file... and it is not required.

Here is the size of i18n folder with charmaps/UTF-8.gz and locales/POSIX:

$ du -h ./i18n
372K    i18n/charmaps
16K     i18n/locales
392K    i18n

392K is not so much.

I wouldn't ship pre-generated locales as all generated locales end up in /usr/glibc-compat/lib/locale/locale-archive binary file.

Having separate packages for each locale might be too much of a hassle since on my Arch Linux system all locales take 9.7MB altogether.

However, an idea has just come into my mind, we can have all locales in a separate package, which I will install in my Dockerfile, generate C.UTF-8 locale, and remove the package since I cannot imagine someone would re-generate locales inside a Docker container.

andyshinn commented 8 years ago

OK, I'll plan to create a glibc-i18n package that can be used with localdef to generate the locale and then be removed.

Another idea, is there a file or current way to know the locale desired (like a file?). We could make a package trigger to automatically generate the locale after install glibc-i18n.

frol commented 8 years ago

In Arch Linux, they use /etc/locale.conf. However, there is no "standard" way of specifying this, and it is the "init" service responsible for setting up locale for the whole system. Thus, in case of Docker containers, I will have to set LANG env variable as a step in Dockerfile, and also put it in /etc/environment (just for the case when user will do su/sudo inside a container).

andyshinn commented 8 years ago

Give https://github.com/andyshinn/alpine-pkg-glibc/releases/tag/pre a try. I added some information to the README as well.

frol commented 8 years ago

This one seems to be fixed! Great!