microsoft / ebpf-for-windows

eBPF implementation that runs on top of Windows
MIT License
2.95k stars 240 forks source link

Workflow failed - km_mt_stress_tests #3607

Closed github-actions[bot] closed 1 month ago

github-actions[bot] commented 5 months ago

Failed Run Codebase Test name - km_mt_stress_tests

shpalani commented 5 months ago

Known issue: ebpf-for-windows.msi installation failed in CICD.

dv-msft commented 5 months ago

This is a dup of #3602. Keeping this open to avoid CI/CD noise

github-actions[bot] commented 5 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 5 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 5 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 5 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 5 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 4 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 4 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 4 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 4 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 4 months ago

Failed Run Codebase Test name - km_mt_stress_tests

shpalani commented 4 months ago

Failure Log:

[01:47:12] :: Starting test *** native_invoke_v4_v6_programs_restart_extension_test ***
[01:47:12] :: test threads per program    : 2
[01:47:12] :: test duration (in minutes)  : 5
[01:47:12] :: test verbose output         : false
[01:47:12] :: test extension restart      : false
[01:47:12] :: waiting on 2 test threads...
[01:47:12] :: **_load_attach_program(0) FATAL ERROR: bpf_prog_attach(cgroup_count_connect4.sys) failed.** program:count_tcp_connect4, errno:22
[01:47:12] :: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[01:47:12] :: ebpf_stress_tests_km is a Catch2 v3.6.0 host application.
[01:47:12] :: Run with -? for options
[01:47:12] :: -------------------------------------------------------------------------------
[01:47:12] :: native_invoke_v4_v6_programs_restart_extension_test
[01:47:12] :: -------------------------------------------------------------------------------
[01:47:12] :: D:\a\ebpf-for-windows\ebpf-for-windows\tests\stress\km\stress_tests_km.cpp(1469)
[01:47:12] :: ...............................................................................
[01:47:13] :: D:\a\ebpf-for-windows\ebpf-for-windows\tests\stress\km\stress_tests_km.cpp(822): FAILED:
[01:47:13] ::   REQUIRE( result == 0 )
[01:47:13] :: with expansion:
[01:47:13] ::   -22 == 0
[01:47:13] :: 

[01:47:13] :: *** ERROR *** C:\eBPF\Run-Self-Hosted-Runner-Test.ps1: C:\eBPF\ebpf_stress_tests_km failed.
*** ERROR *** C:\eBPF\Run-Self-Hosted-Runner-Test.ps1: C:\eBPF\ebpf_stress_tests_km failed.
At C:\actions_runner_2019_1\_work\ebpf-for-windows\ebpf-for-windows\x64\Release\vm_run_tests.psm1:33 char:5
+     Invoke-Command -VMName $VMName -Credential $TestCredential -Scrip ...
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (*** ERROR *** C...ests_km failed.:String) [], RuntimeException
    + FullyQualifiedErrorId : *** ERROR *** C:\eBPF\Run-Self-Hosted-Runner-Test.ps1: C:\eBPF\ebpf_stress_tests_km fail 
   ed.
    + PSComputerName        : vm1_ws2019

Error: Process completed with exit code 1.
github-actions[bot] commented 3 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 3 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 3 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 3 months ago

Failed Run Codebase Test name - km_mt_stress_tests

matthewige commented 3 months ago

From https://github.com/microsoft/ebpf-for-windows/actions/runs/10449590580/job/28932788838 [02:40:35] :: native_invoke_v4_v6_programs_restart_extension_test [02:40:35] :: ------------------------------------------------------------------------------- [02:40:35] :: D:\a\ebpf-for-windows\ebpf-for-windows\tests\stress\km\stress_tests_km.cpp(1469) [02:40:35] :: ............................................................................... [02:40:35] :: D:\a\ebpf-for-windows\ebpf-for-windows\tests\stress\km\stress_tests_km.cpp(822): FAILED: [02:40:35] :: REQUIRE( result == 0 ) [02:40:35] :: with expansion: [02:40:35] :: -22 == 0 [02:40:35] ::

This test case has 2 threads, one loading/attaching a connect v4 program and the other loading a v6 program.

Program attach is failing (line 822 mentioned above): result = bpf_prog_attach(program_fd, UNSPECIFIED_COMPARTMENT_ID, attach_type, 0); if (result != 0) { LOG_ERROR( "{}({}) FATAL ERROR: bpf_prog_attach({}) failed. program:{}, errno:{}", func, thread_index, file_name.c_str(), program->program_name, errno); REQUIRE(result == 0); }

From the failure traces: 1132290 [0]0F40.0A84::2024/08/19-02:40:27.557685400 [NetEbpfExtProvider]{"api":"FwpmTransactionBegin","status":"0xC022000E(NT=The call is not allowed from within an explicit transaction.)","meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtApiError","time":"2024-08-19T09:40:27.5576854Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x4"}} 1132291 [0]0F40.0A84::2024/08/19-02:40:27.557712600 [NetEbpfExtProvider]{"ErrorMessage":"net_ebpf_extension_add_wfp_filters returned error","Error":6,"meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtGenericError","time":"2024-08-19T09:40:27.5577126Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x2"}} 1132292 [0]0F40.0A84::2024/08/19-02:40:27.557715200 [NetEbpfExtProvider]{"ErrorMessage":"_net_ebpf_extension_sock_addr_on_client_attach returned error","Error":6,"meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtGenericError","time":"2024-08-19T09:40:27.5577152Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x2"}} 1132293 [0]0F40.0A84::2024/08/19-02:40:27.557735600 [NetEbpfExtProvider]{"ErrorMessage":"_net_ebpf_extension_sock_addr_on_client_attach returned error","Error":6,"meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtGenericError","time":"2024-08-19T09:40:27.5577356Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x2"}} 1132294 [0]0F40.0A84::2024/08/19-02:40:27.557737900 [NetEbpfExtProvider]{"Message":"attach_callback returned failure. Attach attempt rejected.","value":6,"meta":{"provider":"NetEbpfExtProvider","event":"NetEbpfExtGenericMessage","time":"2024-08-19T09:40:27.5577379Z","cpu":0,"pid":3904,"tid":2692,"channel":11,"level":2,"keywords":"0x4"}}

Upon code inspection, it's possible that multiple threads invoke net_ebpf_extension_add_wfp_filters() (this can be multiple programs of the same type, of different program types, which could all hit this issue). We are using a global filter engine handle, which both call FwpmTransactionBegin() using this same global handle, which would cause the failure observed above.

We'll need to fix this - probably adding some serialization, and/or using different filter handles per operation.

matthewige commented 3 months ago

After some offline discussion, the codepaths needed to fix this are also being modified in this PR 3571 (https://github.com/microsoft/ebpf-for-windows/pull/3751). I will wait until that is completed before taking up this fix.

github-actions[bot] commented 3 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 2 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 2 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 2 months ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 1 month ago

Failed Run Codebase Test name - km_mt_stress_tests

github-actions[bot] commented 1 month ago

Failed Run Codebase Test name - km_mt_stress_tests