intel / ccc-linux-guest-hardening

Linux Security Hardening for Confidential Compute
https://intel.github.io/ccc-linux-guest-hardening-docs
MIT License
66 stars 14 forks source link

Guest ABORT: Attempt to finish kAFL run but never initialized #95

Closed williamcroberts closed 1 year ago

williamcroberts commented 1 year ago

Bug

fuzz.sh run build -p1 --debug

<snip>

Worker-00 Guest ABORT: Attempt to finish kAFL run but never initialized

Worker-00 Failed to connect to Qemu: Guest ABORT: Attempt to finish kAFL run but never initialized

Worker-00 Shutting down Qemu after 0 execs..
qemu-system-x86_64: terminating on signal 15 from pid 609811 (/home/bill/workspace/ccc-linux-guest-hardening/kafl/.venv/bin/python3)
Worker-00 Qemu exit code: None
Worker-00 Failed to launch Qemu.
Worker 0 sent ABORT..
Manager exit: Workers aborted before becoming ready. Likely broken VM or agent setup.
Waiting for Workers to shutdown...

Workaround

The "fix" is to change the KConfig (.config file under build directory, in my case ~/data/test1/BOOT_POST_TRAP/build) with the diff as below where ~/config.bak is the good config and .config is the broken config. AND THEN BUILD MANUALLY by going into the build dir and typing make -j$(nproc)

# ~/config.bak is goog .config is broken
diff ~/config.bak .config
331c331
< # CONFIG_TDX_FUZZ_KAFL_SKIP_CPUID is not set
---
> CONFIG_TDX_FUZZ_KAFL_SKIP_CPUID=y
333,334c333,334
< # CONFIG_TDX_FUZZ_KAFL_SKIP_ACPI_PIO is not set
< # CONFIG_TDX_FUZZ_KAFL_SKIP_RNG_SEEDING is not set
---
> CONFIG_TDX_FUZZ_KAFL_SKIP_ACPI_PIO=y
> CONFIG_TDX_FUZZ_KAFL_SKIP_RNG_SEEDING=y
348d347
< # CONFIG_TDX_FUZZ_KAFL_VANILLA_INJECTION_SAMPLE is not set
williamcroberts commented 1 year ago

The linux-guest at the time of this bug was set on:

commit 9950c09711a6e413c167ba0752381d64d292b822 (HEAD -> kafl/fuzz-5.15-4, origin/kafl/fuzz-5.15-4)
tz0 commented 1 year ago

Hi @williamcroberts, could you share the full broken .config? So far, I can't reproduce this bug by changing the Kconfig options above, rebuilding, and testing the same Linux guest (same commit).

I also cannot find any loopholes (from the code) between init_harness.py <dir> BOOT_POST_TRAP to fuzz.sh build that could result in leaving these config options not set.

williamcroberts commented 1 year ago

Hi @williamcroberts, could you share the full broken .config? So far, I can't reproduce this bug by changing the Kconfig options above, rebuilding, and testing the same Linux guest (same commit).

I also cannot find any loopholes (from the code) between init_harness.py <dir> BOOT_POST_TRAP to fuzz.sh build that could result in leaving these config options not set.

Are you sure you're understanding the diff correctly? The working build is the ones with the options turned off.

tz0 commented 1 year ago

I also cannot find any loopholes (from the code) between init_harness.py

BOOT_POST_TRAP to fuzz.sh build that could result in leaving these config options not set.

Yes, I created confusion with the line above. The current automation script will set these three options when selecting the BOOT_POST_TRAP harness with init_harness.py. I will cross out the confusing line in the above.

I cannot reproduce this bug as I tested with options either on or off and observed no abort. Other culprits might be involved. With a full broken .config, I can check the compiler version difference.

williamcroberts commented 1 year ago

@tz0 here is the working config config.bak.txt

williamcroberts commented 1 year ago

@tz0 here is the broken config config.broken.txt

tz0 commented 1 year ago

I can reproduce the issue after tried a few more machines.

The actual option that causes the issue (from my side) is CONFIG_TDX_FUZZ_KAFL_SKIP_RNG_SEEDING=y. I created a PR to link the problem with a temporary fix.

@williamcroberts, if you have time, please help check whether the proposed fix (only disabling CONFIG_TDX_FUZZ_KAFL_SKIP_RNG_SEEDING) would work or not on your side. Leaving the other options set, you can still avoid having input injections for CPUIDs or ACPI subsystems.

Need further inspection for the underlying reason and fix.

il-steffen commented 1 year ago

One way to fix this is switching off the option CONFIG_TDX_FUZZ_KAFL_SKIP_RNG_SEEDING in the .config under your guest kernel's build directory and re-build the guest.

Are you sure this is the right fix? It seems you are just enabling RNG_SEEDING as a workaround but the actual problem is that your harness does not consume inputs.

The underlying issue is that no injection is happening, meaning the kernel/agent is not configured properly or the VM settings (kafl/qemu config) do not match. A typical example is trying to fuzz virtio but not enabling any virtio subsystems in the VM config. No devices will be enabled and kernel skips any virtio setup, meaning no virtio-specific injection will be executed. The harness may see "enable" and "done" events somewhere in core kernel code before and after the virtio subsystem, but it will complain on "done" that no actual injection occured.

This is why init_harness.py not only outputs a kernel .config but also a kafl/qemu config. They should be picked up automatically when you use "fuzz.sh build" and "fuzz.sh run" with the generated target directory.

williamcroberts commented 1 year ago

I can build a broken version, apply this patch and it works.

ereshetova commented 1 year ago

When I did a clean new installation and followed the docs to use BOOT_POST_TRAP, I run into the same issue. However, what fixed it for me was running this:

echo "options kvm-intel ve_injection=1 halt_on_triple_fault=1" | sudo tee /etc/modprobe.d/kvm-intel.conf

and rebooting to take changes in effect.

ereshetova commented 1 year ago

@williamcroberts do you still experience issues here or are you able to test things with default configs from harnesses?

ereshetova commented 1 year ago

Closing this one since the issue was resolved