open-telemetry / opentelemetry-network

eBPF Collector
https://opentelemetry.io
Apache License 2.0
296 stars 46 forks source link

NAT handling for IPv6 formatted IPv4 addresses #255

Closed golisai closed 8 months ago

golisai commented 8 months ago

Description:
This patch solves an issue in dual-stack setups where NAT mappings fail to handle IPv4 addresses in IPv6 format properly. Consequently, Kubernetes service names appear as "dest.workload.name" with "dest.resolution.type" set to DNS, rather than the actual backend Kubernetes workload name. This fix ensures that the NAT handler is invoked correctly for IPv6 addresses when they are IPv4 addresses.

Link to tracking Issue: None

Testing: Verified capturing the Metric with Prometheus

Prior to fix: tcp_bytes{az_equal="true", container="reducer", dest_availability_zone="(unknown)", dest_environment="(unknown)", dest_resolution_type="DNS", dest_workload_name="otel-collector-service.otel.svc.cluster.local", sf_product="network-explorer", source_availability_zone="(unknown)", source_container_name="reducer", source_environment="(unknown)", source_namespace_name="otel-network", source_process_name="opentelemetry-e", source_resolution_type="K8S_CONTAINER", source_workload_name="otel-network-reducer", source_workload_uid="6f7a2e46-a742-4876-8539-f9704e7f49bc"}

After fix:

tcp_bytes{az_equal="true", container="reducer", dest_availability_zone="(unknown)", dest_container_name="otc-container", dest_environment="(unknown)", dest_image_version="23.12.0-1247", dest_namespace_name="otel", dest_process_name="otel-collector", dest_resolution_type="K8S_CONTAINER", dest_workload_name="otel-collector", dest_workload_uid="13a067a2-7eb9-4745-9160-b646789cd2f3", sf_product="network-explorer", source_availability_zone="(unknown)", source_container_name="reducer", source_environment="(unknown)", source_namespace_name="otel-network", source_process_name="opentelemetry-e", source_resolution_type="K8S_CONTAINER", source_workload_name="otel-network-reducer", source_workload_uid="6f7a2e46-a742-4876-8539-f9704e7f49bc"}

Kernel collector logs after the fix: 024-03-06 19:02:16.971037+00:00 trace [p:414024 t:414024] handle_set_state_ipv6: sk:18446628591012088192, ::ffff:10.1.235.218:47274 -> ::ffff:10.152.183.176:24317 (tx_rx=0) 2024-03-06 19:02:16.971039+00:00 trace [p:414024 t:414024] NatHandler::handle_set_state_ipv6: sk=18446628591012088192, src=10.1.235.218:47274, dest=10.152.183.176:24317, tx_rx=0 2024-03-06 19:02:16.971042+00:00 trace [p:414024 t:414024] NatHandler::send_nat_remapping: sk=18446628591012088192, src=10.1.235.218:47274, dst=10.1.36.98:24317

Documentation: None

yonch commented 8 months ago

Hi Sri it’s going to take a few days for me to get around to this, please reach out on cncf slack if this is tied to a deadline

samiura commented 8 months ago

@golisai is it possible for you look at the ci/cd failures?

golisai commented 8 months ago

@golisai is it possible for you look at the ci/cd failures?

@samiura Both failed cases timed out while pulling the build-env container. I am not sure how to debug this. Could it be a temporary issue on the registry side?

golisai commented 8 months ago

lgtm. Thank you for solving this misbehavior! 🥇

Thanks