open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

K8s: Collector hangs in not-ready state and no logs #4385

Closed gionapaolini closed 1 year ago

gionapaolini commented 3 years ago

Describe the bug Collector hangs in not ready state, and there are no logs when inspecting the deployment

Steps to reproduce I'm deploying the collector in k8s with the following yaml definition:

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-conf
  labels:
    app: opentelemetry
    component: otel-collector-conf
data:
  otel-collector-config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      batch:
      memory_limiter:
        # Same as --mem-ballast-size-mib CLI argument
        ballast_size_mib: 683
        # 80% of maximum memory up to 2G
        limit_mib: 1500
        # 25% of limit up to 2G
        spike_limit_mib: 512
        check_interval: 5s
      tail_sampling:
        decision_wait: 10s
        num_traces: 100
        expected_new_traces_per_sec: 10
        policies:
          [
              {
                name: test-policy-1,
                type: always_sample
              }
          ]
    extensions:
      health_check: {}
      zpages: {}
    exporters:
      jaeger:
        endpoint: "jaeger-collector:14250" # Replace with a real endpoint.
        insecure: true
    service:
      extensions: [health_check, zpages]
      pipelines:
        traces/1:
          receivers: [otlp]
          processors: [tail_sampling, memory_limiter, batch]
          exporters: [jaeger]
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  ports:
  - name: otlp # Default endpoint for OpenTelemetry receiver.
    port: 4317
    protocol: TCP
    targetPort: 4317
  - name: jaeger-grpc # Default endpoint for Jaeger gRPC receiver
    port: 14250
  - name: jaeger-thrift-http # Default endpoint for Jaeger HTTP receiver.
    port: 14268
  - name: metrics # Default endpoint for querying metrics.
    port: 8888
  selector:
    component: otel-collector
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-collector
  minReadySeconds: 5
  progressDeadlineSeconds: 120
  replicas: 1 
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-collector
    spec:
      containers:
      - command:
          - "/otelcol"
          - "--config=/conf/otel-collector-config.yaml"
          - "--mem-ballast-size-mib=683"
        image: otel/opentelemetry-collector-contrib:latest
        name: otel-collector
        resources:
          limits:
            cpu: 1
            memory: 2Gi
          requests:
            cpu: 200m
            memory: 400Mi
        ports:
        - containerPort: 55679 # Default endpoint for ZPages.
        - containerPort: 4317 # Default endpoint for OpenTelemetry receiver.
        - containerPort: 14250 # Default endpoint for Jaeger HTTP receiver.
        - containerPort: 14268 # Default endpoint for Jaeger HTTP receiver.
        - containerPort: 8888  # Default endpoint for querying metrics.
        volumeMounts:
        - name: otel-collector-config-vol
          mountPath: /conf
        livenessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension default port.
        readinessProbe:
          httpGet:
            path: /
            port: 13133 # Health Check extension default port.
      volumes:
        - configMap:
            name: otel-collector-conf
            items:
              - key: otel-collector-config
                path: otel-collector-config.yaml
          name: otel-collector-config-vol

What did you expect to see? Collector should be running or at least display error logs when doing kubectl logs deployments/otel-collector

What did you see instead? Collector hangs on not ready and kubectl logs deployments/otel-collector return no logs

What version did you use? Version: v0.30.0

What config did you use? See above

Environment OS: "Ubuntu 20.04"

dmitryax commented 2 years ago

@gionapaolini NotReady usually means that the contatiner cannot be started. If you run kubectl describe on the pod, you will see what's the problem in the events list

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.