Open The-Bootloader opened 1 month ago
Repro steps:
On the host as root:
echo 0 > /proc/sys/kernel/randomize_va_space
I had the bug before doing this, but I performed the steps below in that mode (I thought it was an ASLR-induced bug initially), so I include it for completeness. I doubt it has an effect.
cd ~
mkdir src
cd src
git clone https://github.com/tianocore/edk2
cd edk2
git submodule update --init
git clone https://github.com/tianocore/edk2-platforms.git
cd edk2-platforms
git submodule update --init
cd ..
docker pull ghcr.io/tianocore/containers/ubuntu-22-dev:latest
docker run -it -v "${HOME}":"${HOME}" -e EDK2_DOCKER_USER_HOME="${HOME}" ghcr.io/tianocore/containers/ubuntu-22-dev:latest /bin/bash
On another terminal:
docker exec -itu 0 <instance id> apt update
docker exec -itu 0 <instance id> apt install gdb
(and perform the installation)
Go back to the initial terminal:
cd ~/src/edk2
. edksetup.sh
make -j2 -C BaseTools
vim Conf/target.txt
The modify to the following values:
TARGET_ARCH = X64
TOOL_CHAIN_TAG = GCC
Run build
Take a coffee
./Build/EmulatorX64/DEBUG_GCC/X64/Host
==> SIGSEGV
gdb ./Build/EmulatorX64/DEBUG_GCC/X64/Host
r to launch, wait for SIGSEGV in InternalMemSetMem in SetMem.c:76: *(Pointer8++) = Value; Noting Buffer=0x7ffff7d53a27
b main b AllocateZeroPool r for restart
c to go past main
Entering AllocateZeroPool at Buffer = AllocatePool(AllocationSize)
Press n to step over
print Buffer
displays 0x7ffff3bd3010
n again, is about to go to ZeroMem
Press s to step into
Just entered ZeroMem with Buffer = 0x7ffff7d53a27 which is incorrect
==> The code has lost the value of Buffer between the caller AllocateZeroPool
and the callee ZeroMem
==> This is either a compiler bug, or, an incorrect use of __attribute__((ms_abi))
which I see all over the build stdout
Digging deeper
r for restart
layout split
in gdb
Go into AllocateZeroPool right before it calls ZeroMem. You will see that nothing puts anything in %rcx which is the location for the first parameter (X64 calling convention).
If you recompile with -O0 -g3 by adding, in [BuildOptions]
in EmulatorPkg.dsc:
GCC:DEBUG_*_*_CC_FLAGS = -O0 -g3
then you can have better granularity in GDB but the behavior is the same and you see that something goes really wrong in terms of calling convention
My hunch is that the build system setups ms-abi calling convention for some object files and not others. I further suspect that this "default" calling convention is not ms-abi on Linux, but may be ms-abi on Windows, making EmulatorPkg work by accident on Windows and not Linux.
I see that we missed to add a GCC:*_GCC_X64_CC_FLAGS = "-DEFIAPI=__attribute__((ms_abi))"
to Unix/Host/Host.inf when we added the unversioned GCC toolchain tags.
Which is a bug on the main edk2 repo, as opposed to the container images.
Describe the bug ubuntu-22-dev and fedora-37-dev seem to have either a GCC compiler bug producing invalid code, or, build environment issue that generates code with incompatible build conventions between linked object files
Happens with the default build options and also with -O0 -g3 added in EmulatorPkg.dsc, the issue is clearly visible when following the code with gdb
To Reproduce
Doing another compilation, I got another similar behavior where the failure was in CopyMem with a bogus Length parameter (0x7FFFF7E1C440 which is probably a memory or function address from a stray uninitialized register and has no business being in Length)
Since this happens with both containers, there are a few possibilities: 1) Maybe they use the same compiler version with the same bug (12.3.1 20230508 on Fedora and 12.3.0 on Ubuntu), in which case this is a containers issue (need an update?) 2) Maybe the issue comes from the build script of EDK II, that tries to specifies incompatible ABIs between different object files linked together (this is relatively plausible, I have seen some ms-abi defines, but I don't know of any calling convention where no parameters are passed through registers), in which case this is an edk2 issue 3) Maybe I am doing something terribly wrong (improbable, the code compiles and is bogus between function calls, but I am an EDK2 newbie so this is possible), in which case PIBKAC and please let me know :-)
Execution environment ubuntu-22-dev or fedora-37-dev latest as of Aug 4 2024 Running on X64 host running Ubuntu 22.04.4