toolbx-images / images

Community maintained container images to use with toolbx and distrobox
https://containertoolbx.org/
Apache License 2.0
323 stars 29 forks source link

Arch images errors with `manpath: can't set the locale; make sure $LC_* and $LANG are correct` (but run anyway) #76

Closed mogoh closed 8 months ago

mogoh commented 1 year ago

Image and version of the image where the issue happens

quay.io/toolbx-images/archlinux-toolbox:latest

Describe the bug

When I enter the arch linux image I get said error. But entering works anyway.

Reproduction steps

$ toolbox create --image quay.io/toolbx-images/archlinux-toolbox:latest
Created container: archlinux-toolbox-latest
Enter with: toolbox enter archlinux-toolbox-latest
$ toolbox enter archlinux-toolbox-latest
manpath: can't set the locale; make sure $LC_* and $LANG are correct

Host distribution and version, toolbx and podman versions

$ uname -a
Linux e14 6.2.11-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 13 20:27:09 UTC 2023 x86_64 GNU/Linux
$ toolbox --version
toolbox version 0.0.99.4
$ podman --version
podman version 4.5.0
Foxboron commented 1 year ago

Whats the output of locale outside and inside the container?

mogoh commented 1 year ago

This:

$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
Foxboron commented 1 year ago

This seems like a configuration issue on your Linux distribution and not related to toolbox nor the Arch Linux container.

LC_ALL shouldn't be empty and it seems like the locale hasn't been properly generated on the install.

mogoh commented 1 year ago

huh. Strange. I'll investigate. Thanks for the hint.

mogoh commented 1 year ago

Are you sure, that LC_ALL should not be empty? Because where I look it seems quite common to have it empty.

Foxboron commented 1 year ago

The issue isn't necessarily that LC_ALL is empty, there are fallback locales. However the problem is the three errors that locale is displaying.

mogoh commented 1 year ago

Oh, I misread your comment, sorry. My locale on the host is different then inside the container.

Host / Outside the container:

$ locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

Inside the container

$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

This is probably the same issue for the official fedora-toolbox image: https://github.com/containers/toolbox/issues/60

What they did is adding:

RUN dnf -y swap glibc-minimal-langpack glibc-all-langpacks

I don't know what the arch equivalent would be.

Foxboron commented 1 year ago

Arch doesn't pre-built languages so there is no Arch equivalent. Should probably hardcode the LC_* to C.UTF-8 inside the containers I think.

mogoh commented 1 year ago

Sorry, but I don't know how to do that.

This is the /etc/locale.conf inside the container:

$ cat /etc/locale.conf 
LANG=C.UTF-8

I also tried changing it to plain LANG=C but it did not help.

When I inspect the running container, I find that this are the environment variables:

    "Env": [
      "TOOLBOX_PATH=/usr/bin/toolbox",
      "XDG_RUNTIME_DIR=/run/user/1000",
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm",
      "container=podman",
      "LANG=C.UTF-8",
      "HOSTNAME=toolbox",
      "HOME=/var/home/mogoh"
    ],

So I don't know where the locals are actually set.

debarshiray commented 1 year ago

I was playing with the Arch Linux image before merging it as one of the supported ones, and I noticed this too.

This is probably the same issue for the official fedora-toolbox image: https://github.com/containers/toolbox/issues/60

Yes, that was my immediate thought too. Although if you see my comments on that issue, you will realize that I don't have a very deep understanding of all this works.

What they did is adding:

RUN dnf -y swap glibc-minimal-langpack glibc-all-langpacks

In Fedora's glibc packaging, glibc-minimal-langpack is an empty package. It contains no files, but only metadata (Provides: glibc-langpack) that indicates the availability of some language packs. Specifically, the C, POSIX and C.UTF-8 locales, which are already built into the main glibc package, and can be used to satisfy the glibc-langpack requirement of other packages.

The glibc-all-langpacks is the one that contains all the other locales (like en_US.UTF-8) that people actually use, and it also has the same metadata (Provides: glibc-langpack).

Hence the packages can be interchanged without breaking dependencies elsewhere.

debarshiray commented 1 year ago

Arch doesn't pre-built languages so there is no Arch equivalent.

Is there a way for the user to later on install their desired locale? I did notice that the installer made me select the locales that I wanted to be available for use.

I know very little about how Arch works. That's why I am asking.

Maybe Toolbx can do some magic to make the locales from the host available to the container, but I don't know that will work across different glibc versions.

Should probably hardcode the LC_* to C.UTF-8 inside the containers I think.

Wouldn't en_US.UTF-8 be a better choice? It has become the de facto default for user interfaces, and hence might be the least surprising as an arbitrary starting point.

Foxboron commented 1 year ago

Is there a way for the user to later on install their desired locale? I did notice that the installer made me select the locales that I wanted to be available for use.

No, they need to be generated by locale-gen.

Maybe Toolbx can do some magic to make the locales from the host available to the container, but I don't know that will work across different glibc versions.

You can add the locate into /etc/locale.gen and run locale-gen.

Wouldn't en_US.UTF-8 be a better choice? It has become the de facto default for user interfaces, and hence might be the least surprising as an arbitrary starting point.

I'm not sure what the actual difference of C.UTF-8 and en_US.UTF-8 has on user interfaces really.

debarshiray commented 1 year ago

Is there a way for the user to later on install their desired locale? I did notice that the installer made me select the locales that I wanted to be available for use.

No, they need to be generated by locale-gen.

Maybe Toolbx can do some magic to make the locales from the host available to the container, but I don't know that will work across different glibc versions.

You can add the locate into /etc/locale.gen and run locale-gen.

I see, thanks.

Wouldn't en_US.UTF-8 be a better choice? It has become the de facto default for user interfaces, and hence might be the least surprising as an arbitrary starting point.

I'm not sure what the actual difference of C.UTF-8 and en_US.UTF-8 has on user interfaces really.

I would expect C.UTF-8 and en_US.UTF-8 to both show the same strings in most cases, because programmers default to American English in their strings. The difference will show in things like how dates and numbers are formatted, paper sizes, units of measurement, etc.. I have no idea what C.UTF-8 does for those, but with en_US.UTF-8 they should correspond to the standards in the US.

On my Arch Linux host with 16 extra locales on top of C, C.UTF-8 and POSIX, the size of /usr/lib/locale/locale-archive is 6.4M. In comparison, on Fedora, with 866 extra locales, the size is 214M. I suspect that the size penalty will be pretty low if we added just en_US.UTF-8 on top of the 3 that are built into glibc.

Foxboron commented 1 year ago

I think adding it to the container is fine :)

debarshiray commented 1 year ago

When I inspect the running container, I find that this are the environment variables:

    "Env": [
      "TOOLBOX_PATH=/usr/bin/toolbox",
      "XDG_RUNTIME_DIR=/run/user/1000",
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm",
      "container=podman",
      "LANG=C.UTF-8",
      "HOSTNAME=toolbox",
      "HOME=/var/home/mogoh"
    ],

That "LANG=C.UTF-8" is coming from the Arch Linux base image.

$ podman pull docker.io/library/archlinux:base-devel
...
$ podman inspect --format '{{ .Config.Env }}' --type image docker.io/library/archlinux:base-devel
[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LANG=C.UTF-8]
debarshiray commented 1 year ago

Maybe Toolbx can do some magic to make the locales from the host available to the container, but I don't know that will work across different glibc versions.

Sadly that magic can't be a straight bind mount of /usr/lib/locale from the host to the container. See: https://bugzilla.redhat.com/show_bug.cgi?id=956993 and https://sourceware.org/legacy-ml/libc-alpha/2013-04/msg00676.html and https://bugzilla.gnome.org/show_bug.cgi?id=698383

Thanks to @halfline for those references.

debarshiray commented 1 year ago

I spent some time over the past few days digging into this.

First, containers created from the ubuntu-toolbox image also suffer from this same problem, in the sense that they only have the C, C.UTF-8 and POSIX locales inside, because Ubuntu follows the same approach with /etc/locale.gen and locale-gen(8) as Arch Linux.

However, locale(1) doesn't throw any errors inside the containers because the shell start-up files on Ubuntu sanitize the environment. Specifically:

$ cat /etc/profile.d/01-locale-fix.sh 
# Make sure the locale variables are set to valid values.
eval $(/usr/bin/locale-check C.UTF-8)

In this case, that snippet will set LANG to C.UTF-8, which is why locale(1) doesn't complain.

/cc @jmennius and @andrewshadura

debarshiray commented 1 year ago

Maybe Toolbx can do some magic to make the locales from the host available to the container, but I don't know that will work across different glibc versions.

Sadly that magic can't be a straight bind mount of /usr/lib/locale from the host to the container. See: https://bugzilla.redhat.com/show_bug.cgi?id=956993 and https://sourceware.org/legacy-ml/libc-alpha/2013-04/msg00676.html and https://bugzilla.gnome.org/show_bug.cgi?id=698383

I have one potential solution that will only work for Arch Linux containers running on Arch Linux hosts or Ubuntu on Ubuntu.

toolbox(1) will ensure that the container's /etc/locale.gen is kept synchronized with the host's and use an inotify(7) watch to detect changes to the host's /etc/locale.gen. We already do this for /etc/localtime, so that's easy. When there's a change on the host, we run locale-gen(8) inside the container to update /usr/lib/locale/locale-archive.

I am discussing this with the GNU C Library folks to be sure that we explore all possible options and settle for the best possible one.

andrewshadura commented 1 year ago

Debian and Ubuntu have locales-all that should cover most cases already.

(Replying from my phone, sorry for being very concise.)

debarshiray commented 1 year ago

Debian and Ubuntu have locales-all that should cover most cases already.

Interesting. I didn't know about locales-all -- I am still learning about how things work outside the Fedora family.

I see that it uses the per-locale sub-directories in /usr/lib/locale instead of the mmap-able /usr/lib/locale/locale-archive blob. Why is that?

The size of the files in /usr/lib/locale is 229M. Do you think it will be alright to include locales-all in the ubuntu-toolbox images?

We include the equivalent of locales-all in the fedora-toolbox images (ie., glibc-all-langpacks, but it uses the mmap-able blob, not the per-locale sub-directories). That's how Fedora Silverblue and Workstation hosts are configured. So, other than increasing the sizes of the images by 200 odd megabytes, it has the advantage of exactly matching the host's configuration.

I am wondering why Ubuntu Desktop doesn't use locales-all, and what the intended purpose of this package is? Since it doesn't use the faster mmap-able blob, I am wondering if this might have some performance impact.

On one hand, the less we do at run-time in toolbox(1) with /etc/locale.gen and locale-gen(8), the more robust and testable things are, but on the other there's the increased image size and potential performance concerns.

andrewshadura commented 1 year ago

I am wondering why Ubuntu Desktop doesn't use locales-all, and what the intended purpose of this package is?

Typical Debian and Ubuntu installs don’t use it because it’s extra space, and also because you usually know what locales users most likely want to use.

As for the purpose of locales-all, the package description states it :slightly_smiling_face:

Description: GNU C Library: Precompiled locale data This package contains the precompiled locale data for all supported locales. A better alternative is to install the locales package and only select desired locales, but it can be useful on a low-memory machine because some locale files take a lot of memory to be compiled.

I guess instead of doing inotify, toolbox could (on create? enter?) verify locales in /etc/locale.gen work (e.g. by trying to setlocale), and if some don’t, generate them. In fact, reading locale-gen’s source code, apparently this will happen if you run locale-gen --keep-existing.

debarshiray commented 1 year ago

I am wondering why Ubuntu Desktop doesn't use locales-all, and what the intended purpose of this package is?

Typical Debian and Ubuntu installs don’t use it because it’s extra space, and also because you usually know what locales users most likely want to use.

It sounds like Ubuntu Desktop is a lot more disk space sensitive compared to Fedora Silverblue and Workstation, whereas the Fedora family has an explicit desire to avoid building things on the user's systems and instead ship pre-built and tested locales. Different groups, different philosophies and trade-offs, I guess. :)

So, I am assuming that we don't want to include locales-all in the ubuntu-toolbox images, because if it's too big for the Ubuntu Desktop ISO, then it's likely too big for the OCI image. This is, of course, just my (temporary) assumption for the rest of this comment. You are free to do otherwise, in which case, the problem goes away. :)

As for the purpose of locales-all, the package description states it slightly_smiling_face

I was hoping to dig up the historical background behind the way things are done in (Debian and) Ubuntu. For example, here are some more historical references from GNOME and Fedora:

I guess instead of doing inotify, toolbox could (on create? enter?) verify locales in /etc/locale.gen work (e.g. by trying to setlocale), and if some don’t, generate them.

You mean the host's or the container's /etc/locale.gen?

I saw that Ubuntu has a downsteam Settings patch that adds a Manage Installed Languages button to the Region & Language panel that Arch Linux doesn't have. I need to check exactly what that code does. So far, my assumption is that it edits /etc/locale.gen and runs locale-gen(8).

The plan that I described earlier would involve the container's entry point, which is toolbox init-container, doing these when entering a container for the first time with toolbox enter:

In fact, reading locale-gen’s source code, apparently this will happen if you run locale-gen --keep-existing.

By the way, Arch Linux's locale-gen is a lot more stripped down than (Debian's and) Ubuntu's. :)

andrewshadura commented 1 year ago

The plan that I described earlier would involve the container's entry point, which is toolbox init-container, doing these when entering a container for the first time with toolbox enter

Oh yeah, that’s what I meant, I just wasn’t sure inotify is actually needed or just generating missing locales on the first enter would be enough… it’s not like users are likely to change locales every day?

By the way, Arch Linux's locale-gen is a lot more stripped down than (Debian's and) Ubuntu's. :)

Yes, Ubuntu have extended Debian’s, but for some reason their changes were not pushed back to Debian (although they tried at least once).

debarshiray commented 1 year ago

The plan that I described earlier would involve the container's entry point, which is toolbox init-container, doing these when entering a container for the first time with toolbox enter

Oh yeah, that’s what I meant, I just wasn’t sure inotify is actually needed or just generating missing locales on the first enter would be enough… it’s not like users are likely to change locales every day?

If you look for fsnotify.NewWatcher in src/cmd/initContainer.go then you will see the current code for timezone handling. It's pretty simple.

I am worried about race conditions that may abort the localedef(1) process invoked by locale-gen(8) and negative fallout from that. People expect to have to log out after adding new locales. What if that aborts an ongoing localedef(1) process inside a Toolbx container? Do we risk ending up with a corrupt /usr/lib/locale/locale-archive because of that?

This can also happen if we update /usr/lib/locale/locale-archive when entering the container, because, at least in theory, the user may log out whenever they want.

I want to fully understand this before writing the code. :)

By the way, Arch Linux's locale-gen is a lot more stripped down than (Debian's and) Ubuntu's. :)

Yes, Ubuntu have extended Debian’s, but for some reason their changes were not pushed back to Debian (although they tried at least once).

I see. I didn't know that Ubuntu extended Debian's copy.

andrewshadura commented 1 year ago

Yes, updating it is tricky, I’ve seen some reports (see here) that updating locales causes apps to fail while the file is being written to. An alternative would be to compile locales into a different non-default file, and then atomically replace it?

andrewshadura commented 1 year ago

Another alternative: pass --no-archive to localedef and let it create directories (also probably elsewhere initially, then atomically move to the right place).

travier commented 8 months ago

Closing as we'll soon withdraw the Arch Linux images in favor of the upstream ones from the toolbx project: https://github.com/toolbx-images/images/pull/82

If you can reproduce this issue with those images, please file an issue there. Thanks.