xd009642 / tarpaulin

A code coverage tool for Rust projects
https://crates.io/crates/cargo-tarpaulin
Apache License 2.0
2.5k stars 180 forks source link

Segmentation fault when running inside a container #817

Closed kaenganxt closed 3 years ago

kaenganxt commented 3 years ago

Describe the bug I recently encountered a segmentation fault error when running tarpaulin (version 0.18.0) with multi-threaded tests inside an LXC container. This looked very similar to issue #190, as it appeared randomly. I realized that tarpaulin shows the following warning every time when starting a test:

WARN cargo_tarpaulin::process_handling::linux: Failed to set processor affinity EINVAL: Invalid argument

I tried to prevent any actual parallel execution by setting the processor affinity manually with taskset -ac 0 cargo tarpaulin, which also failed with the message Invalid argument. Looking at the affinity mask of existing processes, it showed that all processes in this container were running with the mask 0xefb5420050 (containing 16 bits set to one for 16 cores). This might be caused by the fact that only some cores are forwarded inside the container.

Setting the mask to one or two of those cores made the segfault disappear (e.g. with taskset -ac 4,6 cargo tarpaulin). Setting it to three or more cores though made the problem reappear.

Looking at the tarpaulin code at https://github.com/xd009642/tarpaulin/blob/develop/src/process_handling/linux.rs#L85 the warning and the issues might be fixable by checking an existing affinity mask and setting it to only one of those existing cores. I'll try this out today or tomorrow to see if it consistently solves the segfault issues.

Although I was wondering why this affinity setting is needed in the first place. It seems to me that tarpaulin only works correctly if no code is actually executed in parallel, enforced by the processor affinity. Is this an intended limitation or just a workaround for deeper issues?

To Reproduce I used the following example to produce the segfaults. They happen at around 25% of the time. lib.rs

xd009642 commented 3 years ago

So I'm not 100% if it is needed, I either saw some code or documentation from something like strace or gdb that lead me to think it could fix an intermittent segfault other people were reporting that I couldn't reproduce. But tarpaulin has moved on a lot since then so I think it might just be safer to remove it... Either the checking existing affinity or just removing it would be solutions in my book, just need to do some more thorough tests to make sure it doesn't break anything

kaenganxt commented 3 years ago

The current implementation of setting the affinity is equal to setting no affinity at all in my environment, so that seems to be the reason why I see those segfaults. This means that it is definitely needed for me, as otherwise many different tests (that involve multiple threads) show these segfaults. I was just wondering why it is needed, as limiting tests to only one core increases execution time.

I replaced the limit_affinity function with the following:

pub fn limit_affinity() -> nix::Result<()> {
    let this = Pid::this();
    // Get current affinity to be able to limit the cores to one of
    // those already in the affinity mask.
    let affinity = sched_getaffinity(this)?;
    let mut selected_cpu = 0;
    for i in 0..CpuSet::count() {
        if affinity.is_set(i)? {
            selected_cpu = i;
            break;
        }
    }
    let mut cpu_set = CpuSet::new();
    cpu_set.set(selected_cpu)?;
    sched_setaffinity(this, &cpu_set)
}

Running tarpaulin with this fix showed no more segfaults during my tests. Should I make a pull request?

xd009642 commented 3 years ago

Yes please (also don't forget to update changelog) :smile: