microsoft / ebpf-for-windows

eBPF implementation that runs on top of Windows
MIT License
2.92k stars 232 forks source link

Native eBPF program attach failure causes the next valid attempt to fail #2133

Open mtfriesen opened 1 year ago

mtfriesen commented 1 year ago

Describe the bug

If an application tries to open+load+attach the same native program immediately after it failed to attach, eBPF fails to load the program on the second attempt, seemingly because the first driver is still unloading.

[0]0C1C.0FD0::2023/02/24-10:04:25.996585800 [xdpfntest] TryAttachEbpfXdpProgram:3817 bpf_xdp_attach failed: -22
[0]0C1C.0FD0::2023/02/24-10:04:26.026993200 [xdpfntest] TryAttachEbpfXdpProgram:3796 bpf_object__load failed: -2

Separately, the same native eBPF program cannot be loaded concurrently, but it's unclear whether that is by design.

OS information

20348.859.amd64fre.fe_release_svc_prod2.220707-1832

Steps taken to reproduce bug

  1. Attach eBPF program A (native or JIT) to an XDP interface
  2. Open, load, attach native eBPF program B to the same XDP interface - this will be rejected by XDP
  3. bpf_object__close native program B.
  4. Attach native eBPF program B to the same XDP interface using XDP_FLAGS_REPLACE

Expected behavior

Attach (3) should succeed.

Actual outcome

Attach (3) fails unless the caller delays the thread between attach (2) and (3) long enough for the driver loaded during attach (2) to unload.

Additional details

ebpf_native_load_fail.log

Concrete code:

using unique_bpf_object = wistd::unique_ptr<bpf_object, wil::function_deleter<decltype(&::bpf_object__close), ::bpf_object__close>>;

static
HRESULT
TryAttachEbpfXdpProgram(
    _Out_ unique_bpf_object &BpfObject,
    _In_ const TestInterface &If,
    _In_ const CHAR *BpfRelativeFileName,
    _In_ const CHAR *BpfProgramName,
    _In_ INT AttachFlags = 0
    )
{
    HRESULT Result;
    CHAR Path[MAX_PATH];
    std::string BpfAbsoluteFileName;
    bpf_program *Program;
    int ProgramFd;
    int ErrnoResult;

    Result = GetCurrentBinaryPath(Path, RTL_NUMBER_OF(Path));
    if (FAILED(Result)) {
        goto Exit;
    }

    BpfAbsoluteFileName = Path;
    BpfAbsoluteFileName += BpfRelativeFileName;

    BpfObject.reset(bpf_object__open(BpfAbsoluteFileName.c_str()));
    if (BpfObject.get() == NULL) {
        TraceError("bpf_object__open failed: %d", errno);
        Result = E_FAIL;
        goto Exit;
    }

    ErrnoResult = bpf_object__load(BpfObject.get());
    if (ErrnoResult != 0) {
        TraceError("bpf_object__load failed: %d, errno=%d", ErrnoResult, errno);
        Result = E_FAIL;
        goto Exit;
    }

    Program = bpf_object__find_program_by_name(BpfObject.get(), BpfProgramName);
    if (Program == NULL) {
        TraceError("bpf_object__find_program_by_name failed: %d", errno);
        Result = E_FAIL;
        goto Exit;
    }

    ProgramFd = bpf_program__fd(Program);
    if (ProgramFd < 0) {
        TraceError("bpf_program__fd failed: %d", errno);
        Result = E_FAIL;
        goto Exit;
    }

    ErrnoResult = bpf_xdp_attach(If.GetIfIndex(), ProgramFd, AttachFlags, NULL);
    if (ErrnoResult != 0) {
        TraceError("bpf_xdp_attach failed: %d, errno=%d", ErrnoResult, errno);
        Result = E_FAIL;
        goto Exit;
    }

    Result = S_OK;

Exit:

    if (FAILED(Result)) {
        BpfObject.reset();
    }

    return Result;
}

VOID
GenericRxEbpfAttach()
{
    auto If = FnMpIf;

    unique_bpf_object BpfObject = AttachEbpfXdpProgram(If, "\\bpf\\drop.o", "drop");

    unique_bpf_object BpfObjectReplacement;
    TEST_TRUE(FAILED(TryAttachEbpfXdpProgram(BpfObjectReplacement, If, "\\bpf\\pass.sys", "pass")));

    //
    // TODO: eBPF doesn't wait for the pass.sys driver to completely unload
    // after tearing down the object, so allow some time for that to happen
    // before retrying with the replace flag.
    //
    Sleep(TEST_TIMEOUT_ASYNC_MS);
    BpfObjectReplacement =
        AttachEbpfXdpProgram(If, "\\bpf\\pass.sys", "pass", XDP_FLAGS_REPLACE);
}
saxena-anurag commented 1 year ago

@mtfriesen can you confirm if the step 2 in the repro steps above also unloaded the eBPF program B, before moving to step 3? If the test is unloading program B step 2, can you try the following and see if it still reproduces?

  1. not unload the program in step 2
  2. re-attach the same eBPF program B (which was loaded in step 2) in step 3?
mtfriesen commented 1 year ago

Yup, I've updated the repro steps to include the bpf_object__close between original steps 2 and 3.

Confirmed no issues issuing the two bpf_xdp_attach calls directly in sequence, i.e. without closing the BPF object created in step (2) and reusing it for original step (3).

mtfriesen commented 11 months ago

@shankarseal could this be prioritized? I am now needing to work around this eBPF bug in another project's test cases.

Alan-Jowett commented 9 months ago

Can't make it in 2401

shankarseal commented 3 months ago

@Alan-Jowett -- can you make a short-term fix to return EBUSY for the scenario mentioned in this issue? I am moving this to 2408.

Alan-Jowett commented 2 months ago

Still blocked on multi-program fix.