Closed cjwatson closed 1 year ago
Does your host have compat_uts_machine=armv7l
set in the kernel command-line? We do this in our lxd armhf instances on arm64 hosts on focal, because otherwise containers end up declaring that they are armv8-32 machine type which nobody uses.
It would be interesting to see the output of uname -a
from inside your container.
@cjwatson can you try umount /proc/cpuinfo
in the container?
My current guess is that the kernel has made cpuinfo
to be affinity aware (urgh) but in our case, lxcfs provides /proc/cpuinfo as a FUSE overlay inside of the container (to filter the CPUs based on cgroups). LXCFS itself is an arm64 piece of code running on the host, so regardless of the personality of the caller, /proc/cpuinfo
from the kernel will be accessed by an arm64 binary.
If that's indeed the issue, we can move the bug over to lxcfs and see if there's some kind of way to:
1) Determine the personality of the caller process (whatever opens /proc/cpuinfo in the container)
2) Somehow trick the kernel into providing us the cpuinfo
content for that personality rather than our own
I suspect that 1) should be easy enough to figure out through some proc file, 2) may be a bit more challenging though.
@xnox My Canonistack test didn't have compat_uts_machine=armv7l
on the command line, but Launchpad's arm64 builder VMs do. In a container on a builder, uname -a
prints Linux flexible-bluejay 5.4.0-124-generic lxc/lxd#140-Ubuntu SMP Thu Aug 4 02:27:01 UTC 2022 armv7l armv7l armv7l GNU/Linux
.
@stgraber You're quite right: /proc/cpuinfo
is mounted, and if I unmount it then I see the correct features.
- Determine the personality of the caller process (whatever opens /proc/cpuinfo in the container)
cat /proc/$PID/personality should give the value of the calling process.
- Somehow trick the kernel into providing us the
cpuinfo
content for that personality rather than our ownI suspect that 1) should be easy enough to figure out through some proc file, 2) may be a bit more challenging though.
I think one can use syscall previous = personality(PER_LINUX32);
to switch to 32bit, or like use whatever value one got from procfs personality file.
check return is not negative, and restore personality after one is done.
Moving over to LXCFS. It may take us a little while before we have manpower to put on this (we'll have a new hire on it, just not sure about start date yet).
Until then, I'd recommend unmounting /proc/cpuinfo in such environments. It will have the downside of possibly over-reporting the number of CPU cores available to some tools, but that's likely less problematic than the incorrect CPU flags.
@stgraber Thanks for the suggestion. I've proposed https://code.launchpad.net/~cjwatson/launchpad-buildd/+git/launchpad-buildd/+merge/428923 for that.
This is worked around on Launchpad production now.
We can try to detect called process pid on the fuse daemon side, because we have pid in struct fuse_in_header
structure. And then use it to obtain personality of the caller.
@cjwatson can you try
umount /proc/cpuinfo
in the container?
I might have something related. On Raspbian I have a similar issue after switching to the 64bit Kernel. All containers are still 32bit. After the change multiple entries in /proc inside the containers were not updated. I also tested a 64bit container with the same result. What the entries had in common was their size of 4096bytes and not the usual 0byte. I implemented the following one liner to the startup process of every container which is a workaround for me:
/usr/bin/find /proc/ -maxdepth 1 -size 4096c -exec /bin/umount {} \;
I think it might be related to cpuinfo and is not limited to only this entry. But then the Raspbian Kernel is a bit special anyways. I hope someone finds this workaround useful.
@lanmarc77 this issue was fixed already in https://github.com/lxc/lxcfs/pull/567 You just need to update the lxcfs on your machines.
Required information
Issue description
We've been trying to work out why Rust-based snap builds for armhf hang on Launchpad's build farm, where they're executed in armhf containers via LXD on arm64 machines, also using
linux32
to set the personality (although this may not be necessary when running in a 32-bit LXD container - I think LXD already handles that?).We seem to be running into something like https://github.com/rust-lang/rust/issues/60605, but it's a little weirder than that.
rustup
is only pickingarm
(i.e. ARMv6) because it gets confused about the processor's capabilities. rustup-init.sh has this code:And we're seeing:
I tried to track this down in a less weird environment than a builder, launching an Ubuntu 22.04 arm64 machine as described in the
lxc info
output above. I got as far as this:This seems pretty odd, but at this point I don't know where to look next. Is this a LXD bug for somehow failing to set up the environment correctly, or is it a kernel bug for getting confused by containerization and somehow not noticing the personality change?
Steps to reproduce
lxc launch
an armhf container on arm64, and runlinux32 grep -m1 ^Features /proc/cpuinfo
inside it.