open-telemetry / opentelemetry-network

eBPF Collector
https://opentelemetry.io
Apache License 2.0
296 stars 46 forks source link

Not able to run my-opentelemetry-ebpf-kernel-collector #270

Open bran1501 opened 3 months ago

bran1501 commented 3 months ago

What happened?

Description

While implementing the ebpf helm chart from https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-ebpf/values.yaml I configured the endpoint but once it starts, the daemonset my-opentelemetry-ebpf-kernel-collector fails returning these errors: Err:1 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY DDCAE044F796ECB0

Reading package lists... W: GPG error: https://nvidia.github.io/libnvidia-container/stable/deb/amd64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY DDCAE044F796ECB0 E: The repository 'https://nvidia.github.io/libnvidia-container/stable/deb/amd64 InRelease' is not signed.

Steps to Reproduce

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts helm repo update open-telemetry sudo apt-get install --yes linux-headers-$(uname -r) ebpf.yaml endpoint: address: " my-splunk-otel-collector.otel.svc.cluster.local" kernelCollector: image: tag: "v0.10.2" name: opentelemetry-ebpf-kernel-collector

helm --namespace=otel install my-opentelemetry-ebpf -f ebpf.yaml open-telemetry/opentelemetry-ebpf

Expected Result

Pod should be able to download the respective dependencies.

Actual Result

Pod not able to resolve the dependencies.

eBPF Collector version

0.10.2

Environment information

Environment

OS: christhianb@christhianb-k8s:~$ cat /etc/*release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS" NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

eBPF Collector configuration

endpoint:
  address: " my-splunk-otel-collector.otel.svc.cluster.local"
kernelCollector:
  image:
    tag: "v0.10.2"
    name: opentelemetry-ebpf-kernel-collector

Log output

resolving kernel headers...
cleaning up stale kprobes...
--- BEGIN log from kernel headers resolution with error 'kernel_headers_misconfigured_repo': -------------
+ kernel_headers_info_path=/var/run/ebpf_net/kernel_headers.cfg
++ uname -r
+ kernel_version=5.15.0-1065-gcp
+ kernel_headers_usr_src_base_path=/usr/src
+ kernel_headers_lib_modules_base_path=/lib/modules
+ host_dir=/hostfs
+ host_etc_dir=/hostfs/etc
+ host_yum_vars_dir=/hostfs/etc/yum/vars
+ host_cache_dir=/hostfs/cache/ebpf_net
+ host_usr_src_dir=/hostfs/usr/src
+ host_lib_modules_dir=/hostfs/lib/modules
+ host_kernel_headers_dir=/hostfs/lib/modules/5.15.0-1065-gcp
+ host_cache_kernel_headers_dir=/hostfs/cache/ebpf_net/kernel-headers
+ host_cache_kernel_headers_archive=/hostfs/cache/ebpf_net/kernel-headers/5.15.0-1065-gcp.tar.gz
+ kernel_headers_lib_modules_path=/lib/modules/5.15.0-1065-gcp
+ kernel_headers_beacon_path=("build/include/linux/tcp.h" "source/include/linux/tcp.h")
+ entrypoint_error=
+ kernel_headers_source=unknown
++ detect_distro
++ debian_os_file=/hostfs/etc/debian_version
++ os_release_file=/hostfs/etc/os-release
++ system_release_file=/hostfs/etc/system-release
++ [[ -e /hostfs/etc/debian_version ]]
++ echo debian
++ return
+ host_distro=debian
+ resolve_kernel_headers
+ check_kernel_headers_installed
+ base_dir=/lib/modules/5.15.0-1065-gcp
+ [[ -n '' ]]
+ for header_file in "${kernel_headers_beacon_path[@]}"
+ [[ -e /lib/modules/5.15.0-1065-gcp/build/include/linux/tcp.h ]]
+ for header_file in "${kernel_headers_beacon_path[@]}"
+ [[ -e /lib/modules/5.15.0-1065-gcp/source/include/linux/tcp.h ]]
+ return 1
+ use_host_kernel_headers
+ [[ -e /usr/src ]]
+ rm -rf /usr/src
+ [[ -e /lib/modules ]]
+ [[ -e /hostfs/usr/src ]]
+ ln -s /hostfs/usr/src /usr/src
+ [[ -e /hostfs/lib/modules ]]
+ ln -s /hostfs/lib/modules /lib/modules
+ check_kernel_headers_installed /hostfs/lib/modules/5.15.0-1065-gcp
+ base_dir=/lib/modules/5.15.0-1065-gcp
+ [[ -n /hostfs/lib/modules/5.15.0-1065-gcp ]]
+ base_dir=/hostfs/lib/modules/5.15.0-1065-gcp
+ for header_file in "${kernel_headers_beacon_path[@]}"
+ [[ -e /hostfs/lib/modules/5.15.0-1065-gcp/build/include/linux/tcp.h ]]
+ for header_file in "${kernel_headers_beacon_path[@]}"
+ [[ -e /hostfs/lib/modules/5.15.0-1065-gcp/source/include/linux/tcp.h ]]
+ return 1
+ check_kernel_headers_installed
+ base_dir=/lib/modules/5.15.0-1065-gcp
+ [[ -n '' ]]
+ for header_file in "${kernel_headers_beacon_path[@]}"
+ [[ -e /lib/modules/5.15.0-1065-gcp/build/include/linux/tcp.h ]]
+ for header_file in "${kernel_headers_beacon_path[@]}"
+ [[ -e /lib/modules/5.15.0-1065-gcp/source/include/linux/tcp.h ]]
+ return 1
+ return 1
+ use_cached_kernel_headers
+ [[ ! -e /hostfs/cache/ebpf_net/kernel-headers/5.15.0-1065-gcp.tar.gz ]]
+ [[ -d /hostfs/cache/ebpf_net/kernel-headers ]]
+ return 1
+ [[ true == \f\a\l\s\e ]]
+ echo 'no kernel headers found, attempting to auto-fetch...'
no kernel headers found, attempting to auto-fetch...
+ install_kernel_headers
+ [[ -e /usr/src ]]
+ rm -rf /usr/src
+ [[ -e /lib/modules ]]
+ rm -rf /lib/modules
+ case "${host_distro}" in
+ install_apt_kernel_headers
+ kernel_headers_pkg_name=linux-headers-5.15.0-1065-gcp
+ sources_list=/hostfs/etc/apt/sources.list
+ [[ ! -e /hostfs/etc/apt/sources.list ]]
+ apt_cmd_args=(--no-install-recommends -o "Dir::Etc::sourcelist=${sources_list}")
+ sources_list_d=/hostfs/etc/apt/sources.list.d
+ [[ -e /hostfs/etc/apt/sources.list.d ]]
+ apt_cmd_args+=(-o "Dir::Etc::sourceparts=${sources_list_d}")
+ trusted_gpg=/hostfs/etc/apt/trusted.gpg
+ [[ -e /hostfs/etc/apt/trusted.gpg ]]
+ apt_cmd_args+=(-o "Dir::Etc::trusted=${trusted_gpg}")
+ trusted_gpg_d=/hostfs/etc/apt/trusted.gpg.d
+ [[ -e /hostfs/etc/apt/trusted.gpg.d ]]
+ apt_cmd_args+=(-o "Dir::Etc::trustedparts=${trusted_gpg_d}")
+ apt-get update --no-install-recommends -o Dir::Etc::sourcelist=/hostfs/etc/apt/sources.list -o Dir::Etc::sourceparts=/hostfs/etc/apt/sources.list.d -o Dir::Etc::trusted=/hostfs/etc/apt/trusted.gpg -o Dir::Etc::trustedparts=/hostfs/etc/apt/trusted.gpg.d
Get:1 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease [1477 B]
Get:2 https://download.docker.com/linux/ubuntu jammy InRelease [48.8 kB]
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Err:1 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY DDCAE044F796ECB0
Get:4 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]
Get:5 https://download.docker.com/linux/ubuntu jammy/stable amd64 Packages [44.0 kB]
Get:6 https://downloadcontent.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_22.04  InRelease [1639 B]
Get:7 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [2785 kB]
Get:8 https://downloadcontent.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.24/xUbuntu_22.04  InRelease [1632 B]
Get:9 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1130 kB]
Get:10 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2115 kB]
Get:11 https://downloadcontent.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_22.04  Packages [8832 B]
Get:12 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [44.7 kB]
Get:13 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:14 https://downloadcontent.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/1.24/xUbuntu_22.04  Packages [2198 B]
Get:15 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:16 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB]
Get:17 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB]
Get:18 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB]
Get:19 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]
Get:20 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [51.8 kB]
Get:21 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [2393 kB]
Get:22 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [2882 kB]
Get:23 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1421 kB]
Get:24 http://archive.ubuntu.com/ubuntu jammy-backports/main amd64 Packages [81.0 kB]
Get:25 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [33.7 kB]
Reading package lists...
W: GPG error: https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY DDCAE044F796ECB0
E: The repository 'https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease' is not signed.
+ entrypoint_error=kernel_headers_misconfigured_repo
+ return 1
+ check_kernel_headers_installed
+ base_dir=/lib/modules/5.15.0-1065-gcp
+ [[ -n '' ]]
+ for header_file in "${kernel_headers_beacon_path[@]}"
+ [[ -e /lib/modules/5.15.0-1065-gcp/build/include/linux/tcp.h ]]
+ for header_file in "${kernel_headers_beacon_path[@]}"
+ [[ -e /lib/modules/5.15.0-1065-gcp/source/include/linux/tcp.h ]]
+ return 1
+ return 1
+ [[ -z kernel_headers_misconfigured_repo ]]
+ cat
---  END  log from kernel headers resolution with error 'kernel_headers_misconfigured_repo': -------------
launching kernel collector...
+ exec /srv/kernel-collector --host-distro debian --kernel-headers-source unknown --entrypoint-error kernel_headers_misconfigured_repo --config-file=/etc/network-explorer/config.yaml --disable-nomad-metadata --warning

Unable to use the host's package manager configuration to automatically install kernel headers
for the Linux distro 'debian'.

Please reach out to support and include this log in its entirety so we can diagnose and fix
the problem.

In the meantime, please install kernel headers manually on each host before running
the Kernel Collector.

To manually install kernel headers, follow the instructions below:

  - for Debian/Ubuntu based distros, run:

      sudo apt-get install --yes "linux-headers-`uname -r`"

  - for RedHat based distros like CentOS and Amazon Linux, run:

      sudo yum install -y "kernel-devel-`uname -r`"

Additional context

No response

ganeshardlkar commented 3 months ago

If you are facing the PUB_KEY error try running the below command

  1. Remove existing keyrings: rm /usr/share/keyrings/cloud.google.gpg && rm /usr/share/keyrings/cloud.google.gpg~ This removes any existing Google Cloud public key files, both the primary and backup (~) versions.
  2. Download and convert the new Google Cloud public key: wget -q -O - https://packages.cloud.google.com/apt/doc/apt-key.gpg | gpg --dearmor -o /usr/share/keyrings/cloud.google-archive-keyring.gpg
  3. Configure the Google Cloud SDK repository: echo "deb [signed-by=/usr/share/keyrings/cloud.google-archive-keyring.gpg] http://packages.cloud.google.com/apt cloud-sdk main" | tee /etc/apt/sources.list.d/google-cloud-sdk.list
  4. Update the package list: apt-get update
bran1501 commented 3 months ago

Hi @ganeshardlkar I checked and I don't have those keyrings, however I downloaded the GPG, install the repo and updated Ubuntu, but the issue persists Reading package lists... W: GPG error: https://nvidia.github.io/libnvidia-container/stable/deb/amd64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY DDCAE044F796ECB0 E: The repository 'https://nvidia.github.io/libnvidia-container/stable/deb/amd64 InRelease' is not signed.

Unable to use the host's package manager configuration to automatically install kernel headers for the Linux distro 'debian'.

Please reach out to support and include this log in its entirety so we can diagnose and fix the problem.

In the meantime, please install kernel headers manually on each host before running the Kernel Collector.

To manually install kernel headers, follow the instructions below:

ganeshardlkar commented 2 months ago

Hi @bran1501 were you able to find any solution to this issue? If yes, request you to comment down your approach. Thanks

bran1501 commented 2 months ago

@ganeshardlkar I tried different approaches but it looks like a development issue since it hasn't been updated to lastest ubuntu releases.

Momotoculteur commented 1 week ago

Hey 🙋‍♂️ I'm on Minikube and get that error. Any update on that issue ?

Edit : Tried to switch from minikube/dockerd to rancher/containerd, seems better but have another issue on the k8s-collector/k8s-watcher. Avoid k8s docker-api base like minikiube which prevent to access ebpf in docker engine from the host ? 2024/11/19 09:29:39 [Error]: Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp [::1]:8712: connect: connection refused" and also issue on the reducer 2024-11-19 09:38:43.848663+00:00 error [p:1 t:19] Logging core failed to publish internal metrics writer stats
and also kernerl-collector END log from kernel headers resolution with error 'unsupported_distro':

Edit2 : Tried Grafana Beyla and worked like a charm