Closed mbenlioglu closed 11 months ago
I'm not sure I understand correctly: you run box64 in a arm64 docker image using qemu on x86_64? and box64 doesn't behave correctly? but the same box64 from then arm64 docker image running on arm64 directly works fine? If that is the case, it looks more like a qemu issue than a box issue? Or did I get something wrong (that's a lot of emulation layer here!)
yes box64 works fine on native arm but doesn't behave correctly using qemu on x86_64. If I want to use automated builds for dockerhub etc. I pretty much need to use it for automated deployments even though it's a lot of emulation, because my other option is to buy a native arm server for automated builds.
I was hoping it was some kind of UB that could be fixed because I think the reaction I'll get from qemu devs will probably be "just don't use double emulation"
The problem is, the error "relloc() invalid old size" is super generic. Can you try to run with valgrind? (so that will be triple emulation :O ), it might give usefull detail on the issue.
Also, does the test01
from the tests
folder gives the same issue?
Also also, you can try to run with BOX64_DYNAREC=0
env. var. to disable the dynarec, it might run better under qemu.
I will try those in a bit and provide debug outputs
Interestingly, disabling dynarec with the env variable makes it run correctly. Here are the valgrind outputs for running test01
(segfault happens if I don't set the dynarec env var):
Program output (no valgrind):
Dynarec for ARM64, with extension: ASIMD AES CRC32 PMULL ATOMICS PageSize:4096realloc(): invalid pointer
Aborted (core dumped)
Program output (no valgrind) :
Dynarec is off
Box64 with Dynarec v0.2.0 6668614 built on Nov 26 2022 23:02:04
Using default BOX64_LD_LIBRARY_PATH: ./:lib/:lib64/:x86_64/:bin64/:libs64/
Using default BOX64_PATH: ./:bin/
Counted 13 Env var
Looking for test01
Rename process to "test01"
Using native(wrapped) libc.so.6
Using native(wrapped) ld-linux-x86-64.so.2
Using native(wrapped) libpthread.so.0
Using native(wrapped) librt.so.1
Hello x86_64 World!
Ok, I think I found the issue that is triggering the realloc error. I have pushed a fix on both box86 & box64. Hopefully, it will make it work good enough for you now.
@mbenlioglu did you tried? Can this ticket be closed?
Last weeks were a bit busy. I was about to run a test today. Will update you later today
Results are interesting. I thought it's fixed at first because I ran all the tests in the repo in both native arm and qemu system like the following:
for x in {01..20}; do
echo -e "\ntest$x" >> tests86
box86 ./box86/tests/test$x >> tests86
done
for x in {01..22}; do
echo -e "\ntest$x" >> tests64
box64 ./box64/tests/test$x >> tests64
done
Outputs were the same and there were no errors that I could see. But when I run steamcmd.sh
with box86 both systems gave segfault.
Box86 with Dynarec v0.2.9 47581dd built on Dec 14 2022 11:41:41
Error: reading elf header of /tmp/steamcmd/steamcmd.sh, try to launch using bash instead
Box86 with Dynarec v0.2.9 47581dd built on Dec 14 2022 11:41:41
Error: reading elf header of /opt/box64/bash, try to launch using box64 instead
Segmentation fault (core dumped)
(Sidenote: I realized that segfault happens running any bash script (discovered it while trying to run test.sh
below). Note that BOX64_BASH
variable is set to the location of x64 bash provided in box64
repo)
I tried running it with box64
instead, but that fails on qemu system. The problem comes down to steamcmd.sh
calling itself with exec
to do a restart when it needs. This test script mimics the behavior:
#!/usr/bin/env bash
FILE_NAME=`basename "$0" .sh`
UNAME=`uname`
ARCH=`uname -m`
ARG=${1:-first}
echo "Hello, my name is ${FILE_NAME}. I'm running on a ${ARCH} ${UNAME} machine. This is my ${ARG} run"
if [[ "${ARG}" == "first" ]]; then
exec "$0" second
fi
Native output (no box86/box64 emulation):
Hello, my name is test. I'm running on a aarch64 Linux machine. This is my first run
Hello, my name is test. I'm running on a aarch64 Linux machine. This is my second run
Outputs were the same and there were no errors that I could see. But when I run
steamcmd.sh
withbox86
both systems gave segfault.
Valgrind outputs for running box86 $BOX64_BASH
command:
I'm periodically trying with the latest updates, issues still persist. Segfault when calling box86 $BOX64_BASH
might have higher priority since it's also present in native arm. The problem with the script calling itself with exec
is only present in qemu (I also tested different versions of qemu to no avail).
@ptitSeb I played around a little bit and discovered a few things related to this.
Segfault on running box86 $BOX64_BASH
happens only on Debian and RaspberryPi OS, and not on Ubuntu (all 64-bit). This is reproducible on Raspberry Pi 4 with latest 64-bit Raspberry Pi OS natively (based on Debian bullseye). Apparently, a null pointer is being sent to the function internalFreeX86
. Here's a gdb output of the backtrace of the segfault:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x629aff14 in internalFreeX86 (emu=0x1) at /home/testPi/repos/box86/src/emu/x86emu.c:187
187 if(emu->stack2free)
(gdb) backtrace
#0 0x629aff14 in internalFreeX86 (emu=0x1) at /home/testPi/repos/box86/src/emu/x86emu.c:187
#1 0x629affe0 in FreeX86Emu (emu=0x63a18198) at /home/testPi/repos/box86/src/emu/x86emu.c:198
#2 0x62a775bc in emuthread_destroy (p=0x63a18190) at /home/testPi/repos/box86/src/libtools/threads.c:180
#3 0x62a7db7c in fini_pthread_helper (context=0x63a1a120) at /home/testPi/repos/box86/src/libtools/threads.c:1038
#4 0x6288e76c in finiAllHelpers (context=0x63a1a120) at /home/testPi/repos/box86/src/box86context.c:53
#5 0x6288f7d8 in FreeBox86Context (context=0x63748ca4 <my_context>) at /home/testPi/repos/box86/src/box86context.c:360
#6 0x6288cf6c in main (argc=2, argv=0xfffef534, env=0xfffef540) at /home/testPi/repos/box86/src/main.c:1408
The second issue related to the test script was caused by $BOX64_PATH$
and $BOX64_LD_LIBRARY_PATH$
overriding $PATH$
and $LD_LIBRARY_PATH$
environment variables, respectively. I couldn't reproduce this issue natively, and it only presents itself in the qemu emulated system, but as a temporary workaround I managed to get it working by exporting $BOX64_PATH=$BOX64_PATH:$PATH
. I'm not sure if this one is caused by an underlying undefined behavior or not, but the temporary workaround works fine for that one, so it's not as important as the first one.
I pushed something that should help. But this look like a QEMU specific issue to me: a pthread_getspecific(...)
will not return a NULL value if it has never been initialized before. Well, I suppose it's not bad to have proper code and initlialize it anyway,
segfault was actually happening in baremetal Raspberry Pi with Raspberry Pi OS. I'll try your patch now and update you
System information:
Decription:
I'm attempting to bring arm64 support for a steamcmd docker image. Currently, the build and execution works fine on a native arm CPU (tested on Raspberry Pi4 with 64bit OS). However, unfortunately when I attempt to cross-build the arm image on an x86_64 system with qemu, following the docker documentation (which is what happens on automated builds in dockerhub or github actions etc.), during the execution of box86/box64 I get this error:
This happens anytime attempting to run something with box86 or box64, e.g. the following will trigger this:
where
/opt/box64/bash
is the one provided hereYou can see the configuration I use in my Dockerfile. Running the following in an attempt to cross-build the image on an x86 system will reproduce the error:
Error happens on Line 73 of the Dockerfile.
PS: I tried to build an image on native arm, then tried to run the image on an x86 system with qemu to see if that makes any difference, but the error occurs either way.