open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.18k stars 423 forks source link

collector Connection reset #3117

Open bongmu opened 2 months ago

bongmu commented 2 months ago

Component(s)

No response

Describe the issue you're reporting


[root@k8s-1 ~]# kubectl get pods -n opentelemetry-operator-system             
NAME                                                         READY   STATUS    RESTARTS   AGE
demo-collector-6b6c886f49-ctkpd                              1/1     Running   0          68m
opentelemetry-operator-controller-manager-5fb5859986-zjrbv   2/2     Running   0          74m
[root@k8s-1 ~]# 
[root@k8s-1 ~]# kubectl get svc -n opentelemetry-operator-system 
NAME                                                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
demo-collector                                              ClusterIP   10.100.193.182   <none>        4317/TCP,4318/TCP   74m
demo-collector-headless                                     ClusterIP   None             <none>        4317/TCP,4318/TCP   74m
demo-collector-monitoring                                   ClusterIP   10.98.233.69     <none>        8888/TCP            74m
opentelemetry-operator-controller-manager-metrics-service   ClusterIP   10.107.160.14    <none>        8443/TCP            80m
opentelemetry-operator-webhook-service                      ClusterIP   10.101.248.75    <none>        443/TCP             80m
[root@k8s-1 ~]# 
[root@k8s-1 ~]# kubectl get instrumentation -n opentelemetry-operator-system 
NAME              AGE   ENDPOINT                                                   SAMPLER                    SAMPLER ARG
instrumentation   65m   http://demo-collector.opentelemetry-operator-system:4317   parentbased_traceidratio   0.25
[root@k8s-1 ~]# kubectl logs -f -n opentelemetry-operator-system opentelemetry-operator-controller-manager-5fb5859986-zjrbv -c manager  --tail 10
{"level":"info","ts":"2024-07-09T06:44:52.845345122Z","logger":"instrumentation-resource","msg":"default","name":"instrumentation"}
{"level":"info","ts":"2024-07-09T06:44:52.866914625Z","logger":"instrumentation-resource","msg":"validate update","name":"instrumentation"}
^C
[root@k8s-1 ~]# kubectl logs -f -n opentelemetry-operator-system demo-collector-6b6c886f49-ctkpd --tail 10
2024-07-09T06:32:33.157Z        info    memorylimiterprocessor@v0.85.0/memorylimiter.go:102     Memory limiter configured       {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "limit_mib": 24080, "spike_limit_mib": 4816, "check_interval": 1}
2024-07-09T06:32:33.157Z        info    exporter@v0.85.0/exporter.go:275        Development component. May change in the future.        {"kind": "exporter", "data_type": "metrics", "name": "logging"}
2024-07-09T06:32:33.161Z        info    exporter@v0.85.0/exporter.go:275        Development component. May change in the future.        {"kind": "exporter", "data_type": "logs", "name": "logging"}
2024-07-09T06:32:33.176Z        info    service/service.go:138  Starting otelcol...     {"Version": "0.85.0", "NumCPU": 8}
2024-07-09T06:32:33.176Z        info    extensions/extensions.go:31     Starting extensions...
2024-07-09T06:32:33.176Z        warn    internal@v0.85.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks        {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2024-07-09T06:32:33.177Z        info    otlpreceiver@v0.85.0/otlp.go:83 Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2024-07-09T06:32:33.177Z        warn    internal@v0.85.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks        {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2024-07-09T06:32:33.177Z        info    otlpreceiver@v0.85.0/otlp.go:101        Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4318"}
2024-07-09T06:32:33.177Z        info    service/service.go:161  Everything is ready. Begin running and processing data.

This all seems to be working fine, I don’t understand why my app is resetting the Connection?:


K8S YAML:
...
annotations:
        instrumentation.opentelemetry.io/inject-java: opentelemetry-operator-system/instrumentation

 [otel.javaagent 2024-07-09 15:38:41:967 +0800] [OkHttp http://demo-collector.opentelemetry-operator-system:4317/...] ERROR io.opentelemetry.exporter.internal.http.HttpExporter - Failed to export spans. The request could not be executed. Full error message: Connection reset
pavolloffay commented 2 months ago

ERROR io.opentelemetry.exporter.internal.http.HttpExporter

It's HTTP exporter

http://demo-collector.opentelemetry-operator-system:4317/

It's gRPC port

@bongmu could you please share the instrumentation CR?

bongmu commented 2 months ago

ERROR io.opentelemetry.exporter.internal.http.HttpExporter

It's HTTP exporter

http://demo-collector.opentelemetry-operator-system:4317/

It's gRPC port

@bongmu could you please share the instrumentation CR?

opentelemetry-collector.yaml

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: demo
  namespace: opentelemetry-operator-system
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 75
        spike_limit_percentage: 15
      batch:
        send_batch_size: 10000
        timeout: 10s

    exporters:
      logging:
      zipkin:
        endpoint: "http://zipkin-server.zipkin:9411/api/v2/spans"
        format: proto

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [logging,zipkin]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [logging]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [logging]

opentelemetry-instrumentation.yaml:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: instrumentation
  namespace: opentelemetry-operator-system
spec:
  exporter:
    endpoint: http://demo-collector.opentelemetry-operator-system:4317
  propagators:
    - tracecontext
    - baggage
    - b3
  sampler:
    type: parentbased_traceidratio
    argument: "0.25"
  python:
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://demo-collector:4318
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://demo-collector.majorbio-erp-dev:4317
  dotnet:
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://demo-collector.opentelemetry-operator-system:4318
pavolloffay commented 2 months ago
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://demo-collector.majorbio-erp-dev:4317

It instructs java agent to report data to http://demo-collector.majorbio-erp-dev:4317 and not to http://demo-collector.opentelemetry-operator-system:4317. I would suggest deleting it and restarting the app pod.

bongmu commented 2 months ago
OpenTelemetryCollector

Oops, I was careless, deleting this doesn't seem to work, although my app Instrumentation, OpenTelemetryCollector, is in the majorbio-erp-dev namespace

bongmu commented 2 months ago
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: http://demo-collector.majorbio-erp-dev:4317

It instructs java agent to report data to http://demo-collector.majorbio-erp-dev:4317 and not to http://demo-collector.opentelemetry-operator-system:4317. I would suggest deleting it and restarting the app pod.

I pasted it wrong. My apps are all in the opentelemetry-operator-system namespace, but it still reports that error.