tianocore / containers

Repository to maintain and manage edk2 containers
Other
20 stars 25 forks source link

Compiler bug in latest -dev containers #98

Open The-Bootloader opened 1 month ago

The-Bootloader commented 1 month ago

Describe the bug ubuntu-22-dev and fedora-37-dev seem to have either a GCC compiler bug producing invalid code, or, build environment issue that generates code with incompatible build conventions between linked object files

Happens with the default build options and also with -O0 -g3 added in EmulatorPkg.dsc, the issue is clearly visible when following the code with gdb

To Reproduce

Doing another compilation, I got another similar behavior where the failure was in CopyMem with a bogus Length parameter (0x7FFFF7E1C440 which is probably a memory or function address from a stray uninitialized register and has no business being in Length)

Since this happens with both containers, there are a few possibilities: 1) Maybe they use the same compiler version with the same bug (12.3.1 20230508 on Fedora and 12.3.0 on Ubuntu), in which case this is a containers issue (need an update?) 2) Maybe the issue comes from the build script of EDK II, that tries to specifies incompatible ABIs between different object files linked together (this is relatively plausible, I have seen some ms-abi defines, but I don't know of any calling convention where no parameters are passed through registers), in which case this is an edk2 issue 3) Maybe I am doing something terribly wrong (improbable, the code compiles and is bogus between function calls, but I am an EDK2 newbie so this is possible), in which case PIBKAC and please let me know :-)

Execution environment ubuntu-22-dev or fedora-37-dev latest as of Aug 4 2024 Running on X64 host running Ubuntu 22.04.4

The-Bootloader commented 1 month ago

Repro steps:

On the host as root: echo 0 > /proc/sys/kernel/randomize_va_space I had the bug before doing this, but I performed the steps below in that mode (I thought it was an ASLR-induced bug initially), so I include it for completeness. I doubt it has an effect.

cd ~
mkdir src
cd src

git clone https://github.com/tianocore/edk2
cd edk2
git submodule update --init

git clone https://github.com/tianocore/edk2-platforms.git
cd edk2-platforms
git submodule update --init
cd ..

docker pull ghcr.io/tianocore/containers/ubuntu-22-dev:latest
docker run -it -v "${HOME}":"${HOME}" -e EDK2_DOCKER_USER_HOME="${HOME}" ghcr.io/tianocore/containers/ubuntu-22-dev:latest /bin/bash

On another terminal:

docker exec -itu 0 <instance id> apt update
docker exec -itu 0 <instance id> apt install gdb

(and perform the installation)

Go back to the initial terminal:

cd ~/src/edk2
. edksetup.sh
make -j2 -C BaseTools
vim Conf/target.txt

The modify to the following values:

TARGET_ARCH = X64
TOOL_CHAIN_TAG = GCC

Run build Take a coffee

./Build/EmulatorX64/DEBUG_GCC/X64/Host

==> SIGSEGV

gdb ./Build/EmulatorX64/DEBUG_GCC/X64/Host

r to launch, wait for SIGSEGV in InternalMemSetMem in SetMem.c:76: *(Pointer8++) = Value; Noting Buffer=0x7ffff7d53a27

b main b AllocateZeroPool r for restart

c to go past main Entering AllocateZeroPool at Buffer = AllocatePool(AllocationSize) Press n to step over print Buffer displays 0x7ffff3bd3010 n again, is about to go to ZeroMem Press s to step into Just entered ZeroMem with Buffer = 0x7ffff7d53a27 which is incorrect

==> The code has lost the value of Buffer between the caller AllocateZeroPool and the callee ZeroMem ==> This is either a compiler bug, or, an incorrect use of __attribute__((ms_abi)) which I see all over the build stdout

Digging deeper r for restart layout split in gdb Go into AllocateZeroPool right before it calls ZeroMem. You will see that nothing puts anything in %rcx which is the location for the first parameter (X64 calling convention).

If you recompile with -O0 -g3 by adding, in [BuildOptions] in EmulatorPkg.dsc: GCC:DEBUG_*_*_CC_FLAGS = -O0 -g3 then you can have better granularity in GDB but the behavior is the same and you see that something goes really wrong in terms of calling convention

My hunch is that the build system setups ms-abi calling convention for some object files and not others. I further suspect that this "default" calling convention is not ms-abi on Linux, but may be ms-abi on Windows, making EmulatorPkg work by accident on Windows and not Linux.

leiflindholm commented 1 month ago

I see that we missed to add a GCC:*_GCC_X64_CC_FLAGS = "-DEFIAPI=__attribute__((ms_abi))" to Unix/Host/Host.inf when we added the unversioned GCC toolchain tags.

Which is a bug on the main edk2 repo, as opposed to the container images.