open-telemetry / opentelemetry-collector-releases

OpenTelemetry Collector Official Releases
https://opentelemetry.io
Apache License 2.0
251 stars 161 forks source link

Collector execution hangs inside container #683

Open machadovilaca opened 1 month ago

machadovilaca commented 1 month ago

Describe the bug

Running a custom collector locally works as expected, but when running inside a container (Podman), execution hangs in the initial setup steps.

Steps to reproduce

  1. Create a collector with ocb
  2. Configure components with a receiver from opentelemetry-collector-contrib
  3. Create Dockerfile
FROM golang:1.23-bullseye

RUN apt-get update && apt-get install -y libvirt-dev

COPY kubevirt-vm-otel-collector /src/kubevirt-vm-otel-collector
COPY opentelemetry-collector-contrib /src/opentelemetry-collector-contrib

WORKDIR /src/kubevirt-vm-otel-collector

RUN go mod download
RUN go build -o kubevirt-vm-otel-collector

ARG USER_UID=10001
USER ${USER_UID}

ENTRYPOINT ["/src/kubevirt-vm-otel-collector/kubevirt-vm-otel-collector"]
CMD ["--config", "/src/kubevirt-vm-otel-collector/config.yaml"]
  1. Build and run image:
āžœ  podman run <IMG>
2024-10-01T17:56:19.134Z        info    service@v0.109.0/service.go:129 Setting up own telemetry...
2024-10-01T17:56:19.134Z        warn    service@v0.109.0/service.go:196 service::telemetry::metrics::address is being deprecated in favor of service::telemetry::metrics::readers
2024-10-01T17:56:19.134Z        info    service@v0.109.0/telemetry.go:98        Serving metrics {"address": ":8888", "metrics level": "Normal"}
2024-10-01T17:56:19.134Z        info    builders/builders.go:26 Development component. May change in the future.        {"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-10-01T17:56:19.134Z        debug   builders/builders.go:24 Alpha component. May change in the future.      {"kind": "receiver", "name": "kubevirt_vms_receiver", "data_type": "metrics"}
<exits>

Expected Result

(observed only running locally)

āžœ  ./kubevirt-vm-otel-collector --config config.yaml
2024-10-01T19:10:55.839+0100    info    service@v0.109.0/service.go:129 Setting up own telemetry...
2024-10-01T19:10:55.839+0100    warn    service@v0.109.0/service.go:196 service::telemetry::metrics::address is being deprecated in favor of service::telemetry::metrics::readers
2024-10-01T19:10:55.839+0100    info    service@v0.109.0/telemetry.go:98        Serving metrics {"address": ":8888", "metrics level": "Normal"}
2024-10-01T19:10:55.839+0100    info    builders/builders.go:26 Development component. May change in the future.        {"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-10-01T19:10:55.839+0100    debug   builders/builders.go:24 Alpha component. May change in the future.      {"kind": "receiver", "name": "kubevirt_vms_receiver", "data_type": "metrics"}
2024-10-01T19:10:55.841+0100    info    service@v0.109.0/service.go:213 Starting kubevirt-vm-otel-collector...  {"Version": "0.0.1", "NumCPU": 16}
2024-10-01T19:10:55.841+0100    info    extensions/extensions.go:39     Starting extensions...
2024-10-01T19:10:55.841+0100    info    kubevirtvmreceiver/receiver.go:61       KubeVirt VM Receiver started    {"kind": "receiver", "name": "kubevirt_vms_receiver", "data_type": "metrics"}
2024-10-01T19:10:55.841+0100    info    service@v0.109.0/service.go:239 Everything is ready. Begin running and processing data.
2024-10-01T19:10:55.841+0100    info    localhostgate/featuregate.go:63 The default endpoints for all servers in components have changed to use localhost instead of 0.0.0.0. Disable the feature gate to temporarily revert to the previous default.    {"feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-10-01T19:11:00.841+0100    debug   kubevirtvmreceiver/receiver.go:90       Collecting metrics      {"kind": "receiver", "name": "kubevirt_vms_receiver", "data_type": "metrics"}
2024-10-01T19:11:00.841+0100    debug   kubevirtvmreceiver/receiver.go:98       Connecting to VM on socket      {"kind": "receiver", "name": "kubevirt_vms_receiver", "data_type": "metrics", "socket": "qemu:///session"}
...
dosubot[bot] commented 1 month ago

Hey @machadovilaca šŸ‘‹, I'm here to help you with your bug report. I'm Dosu, a bot that can assist with bug fixes, answer questions, and guide you on becoming a contributor. Let's dive into the issue you're experiencing.

To address the issue of the OpenTelemetry Collector hanging during startup inside a Podman container, consider the following:

  1. Network Configuration: Ensure that the network settings are correctly configured. The default endpoints might be binding to localhost instead of 0.0.0.0, which can cause connectivity issues inside containers. Verify and adjust the configuration if necessary.

  2. User Permissions: The Dockerfile sets a specific USER_UID. Ensure that this user has the necessary permissions to execute the collector and access required resources.

  3. Resource Limits: Check if there are any resource constraints (CPU, memory) imposed by Podman that might be affecting the startup. Adjust these limits if needed.

  4. Null Maps in Configuration: Ensure that your configuration files do not contain null values. Use {} for empty maps to avoid issues.

  5. Debugging Tools: Utilize the debug exporter to gain insights into what might be causing the hang. This can help identify if the issue is related to specific components or configurations.

  6. Internal Telemetry: Enable internal telemetry to monitor the collector's performance and identify potential bottlenecks.

These steps should help in diagnosing and resolving the startup hang issue.

To continue talking to Dosu, mention @dosu.

mx-psi commented 1 month ago

From your code it seems like you are using a custom component. Are you able to reproduce this with the official Docker images and components in contrib?

If not, it's hard for us to help you, you can you try and attach a live debugger (e.g. by doing delve attach) and print the stack trace to see where it is stuck