pixie-io / pixie

Instant Kubernetes-Native Application Observability
https://px.dev
Apache License 2.0
5.59k stars 429 forks source link

Deflake `stirling_wrapper_container_bpf_test` #700

Closed vihangm closed 1 year ago

vihangm commented 1 year ago

//src/stirling/e2e_tests:stirling_wrapper_container_bpf_test has a <90% pass rate in CI. De-flake the test, run locally with runs_per_test set to a large number to ensure that the pass rate is higher than 90% and then re-enable the test in CI by removing the disabled_flaky_test tag.

ddelnano commented 1 year ago

After reviewing the build buddy logs for this test, it appears that these fails occur when stirling is unable to attach uprobes. This seems to happen for the GoTLS and http2 uprobes specifically and fail with the following error(s):

I20230119 17:03:25.107693 10415 stirling_wrapper.cc:345] Running for 0 seconds.
cannot attach uprobe, Device or resource busy
W20230119 17:03:25.187938 10463 uprobe_manager.cc:774] Failed to attach HTTP2 Uprobes to /proc/10320/root/root/.cache/bazel/_bazel_root/54060b0ed2e63c063d495ae4fb1a7d19/execroot/px/bazel-out/k8-opt-ST-b5a6a59c0816/bin/src/stirling/source_connectors/socket_tracer/protocols/http2/testing/go_grpc_server/server_/server: Internal : Unable to attach uprobe for binary /proc/10320/root/root/.cache/bazel/_bazel_root/54060b0ed2e63c063d495ae4fb1a7d19/execroot/px/bazel-out/k8-opt-ST-b5a6a59c0816/bin/src/stirling/source_connectors/socket_tracer/protocols/http2/testing/go_grpc_server/server_/server symbol google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders addr 0 offset 0 using probe_http2_server_operate_headers
I20230119 17:03:25.251143 10463 uprobe_manager.cc:869] Number of uprobes deployed = 3377
I20230119 17:03:25.251441 10462 stirling.cc:787] Stirling is running.
I20230119 17:03:27.945524 10415 env.cc:91] CPU usage: 11.3% user, 6.0% system, 17.3% total
I20230119 17:03:27.994982 10415 env.cc:51] Shutting down
Error/Warning count = 1

I20230119 03:40:03.555701  9420 stirling_wrapper.cc:345] Running for 0 seconds.
cannot attach uprobe, Device or resource busy
W20230119 03:40:03.665980  9468 uprobe_manager.cc:760] Failed to attach GoTLS Uprobes to /proc/9350/root/root/.cache/bazel/_bazel_root/54060b0ed2e63c063d495ae4fb1a7d19/execroot/px/bazel-out/k8-opt-ST-b5a6a59c0816/bin/src/stirling/source_connectors/socket_tracer/protocols/http2/testing/go_grpc_server/server_/server: Internal : Unable to attach uprobe for binary /proc/9350/root/root/.cache/bazel/_bazel_root/54060b0ed2e63c063d495ae4fb1a7d19/execroot/px/bazel-out/k8-opt-ST-b5a6a59c0816/bin/src/stirling/source_connectors/socket_tracer/protocols/http2/testing/go_grpc_server/server_/server symbol  addr 5fa599 offset 0 using probe_return_tls_conn_read
I20230119 03:40:03.728276  9468 uprobe_manager.cc:869] Number of uprobes deployed = 3401
ddelnano commented 1 year ago

This is fixed now that #1339 was merged.