open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.52k stars 1.48k forks source link

Transient error StatusCode.UNAVAILABLE encountered while exporting span batch #6363

Closed umgbhalla closed 1 year ago

umgbhalla commented 2 years ago

Describe the bug I have noticed an issue on opentelemetry http collector port , that it gives StatusCode.UNAVAILABLE when sending traces

Steps to reproduce Setup docker compose or k8s setup for opentemetry collector , ( i have confirmed this on both k8s and docker compose ) and use this repo to produce traces, (edit ./src/helpers/tracing/index.ts to change the endpoint if neccesary)

What did you expect to see? no error for status code and traces being collected , as otlp over grpc is working

What did you see instead? StatusCode.UNAVAILABLE only on otlp http

What version did you use? Version: 0.60.0

What config did you use? docker-compose.yaml

version: "2.4"

services:
  otel-collector:
    container_name: otel-collector
    image: otel/opentelemetry-collector:0.60.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    # user: root # required for reading docker container logs
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=otel-host,os.type=linux
    ports:
      # - "1777:1777"     # pprof extension
      - "4317:4317"     # OTLP gRPC receiver
      - "4318:4318"     # OTLP HTTP receiver
      # - "8888:8888"     # OtelCollector internal metrics
      # - "8889:8889"     # signoz spanmetrics exposed by the agent
      # - "9411:9411"     # Zipkin port
      # - "13133:13133"   # health check extension
      # - "14250:14250"   # Jaeger gRPC
      # - "14268:14268"   # Jaeger thrift HTTP
      # - "55678:55678"   # OpenCensus receiver
      # - "55679:55679"   # zPages extension
    restart: on-failure
    networks:
      - api-dockernet

networks:
  api-dockernet:
    driver: bridge

otel-collector-config.yaml

receivers:
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      thrift_compact:
        endpoint: 0.0.0.0:6831
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - http://*
            - https://*
  zipkin:
       endpoint: 0.0.0.0:9411

processors:
  batch:
    send_batch_size: 4000
    send_batch_max_size: 4000
    timeout: 10s
  # If set to null, will be overridden with values based on k8s resource limits
  memory_limiter: null

exporters:
  otlp:
    endpoint: '<redacted>:80'
    tls:
      insecure: true
    sending_queue:
      queue_size: 1000000
  prometheusremotewrite:
    endpoint: 'http://<redacted>/write'
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      exporters: [ otlp]
      processors: [batch]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]

Environment OS: any

Additional context this issue is only happening on otlp http and not on otlp grpc

adityaraibytelearn commented 2 years ago

Is this resolved. I can see the same issue while using grpc.

benjamingorman commented 1 year ago

I'm also seeing this over both grpc and http.

2023-01-12 17:08:10,079 WARNING opentelemetry.exporter.otlp.proto.grpc.exporter /usr/local/lib/python3.8/dist-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py:356   Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 16s.

I'm running the jaeger all in one image like this:

docker run --name jaeger   -e COLLECTOR_OTLP_ENABLED=true -e DJAEGER_AGENT_HOST=0.0.0.0  -p 16686:16686   -p 4317:4317   -p 4318:4318  jaegertracing/all-in-one:1.35
h4ckroot commented 1 year ago

I had a similar issue, and I found that this error will emit if your application cannot reach the collector. This could happen if you are running the application and the collector on two different networks (or on two different docker-compose files that do not share the same network).

I hope this helps!.

charliebarber commented 1 year ago

I am also getting this issue in a docker container between a instrumented Python app and the collector. They are on the same network with the bridge as a driver. Can't seem to fix it.

LronDC commented 1 year ago

May I ask why this issue has been closed?

gilbertobr commented 1 year ago

I am also having the same problem.

Script template used:

import logging

from opentelemetry import trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
    OTLPLogExporter,
)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)

logging.basicConfig(level=logging.DEBUG)

logger_provider = LoggerProvider(
    resource=Resource.create(
        {
            "service.name": "shoppingcart",
            "service.instance.id": "instance-12",
        }
    ),
)
set_logger_provider(logger_provider)

exporter = OTLPLogExporter(endpoint="grpc.otel-collector.my.domain.io:80", insecure=True, timeout=20)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

# Attach OTLP handler to root logger
logging.getLogger().addHandler(handler)

# Log directly
logging.info("Jackdaws love my big sphinx of quartz.")

# Create different namespaced loggers
logger1 = logging.getLogger("myapp.area1")
logger2 = logging.getLogger("myapp.area2")

logger1.debug("Quick zephyrs blow, vexing daft Jim.")
logger1.info("How quickly daft jumping zebras vex.")
logger2.warning("Jail zesty vixen who grabbed pay from quack.")
logger2.error("The five boxing wizards jump quickly.")

# Trace context correlation
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
    # Do something
    logger2.error("Hyderabad, we have a major problem.")

logger_provider.shutdown()
gilbertobr commented 1 year ago

I noticed that in nginx (proxy) returns 400

 "PRI * HTTP/2.0" 400 150 "-" "-" 0 5.001 [] [] - - - - 
sherlockliu commented 1 year ago

Any updates about this one? sounds like haven't resolved but been closed

rodrigoazv commented 1 year ago

In my case i was using wrong name of host, because of the docker-compose, we should use the name of container, in my case

http://jaeger over http://localhost

tquach-evertz commented 1 year ago

Any updates about this one? sounds like haven't resolved but been closed

The same issue hast just happened with our application... Looks like the issue hasn't been resolved yet

john-pl commented 1 year ago

We're having the same problem. I don't feel this should be closed.

wizrds commented 1 year ago

I'm encountering the same issue as well. Running otel-collector in a docker container with the gRPC port exposed and connecting to it from a native python application. The line Transient error StatusCode.UNAVAILABLE encountered while exporting metrics, retrying in 1s. will sometimes spam the logs and other times I don't see it once. Is there anyway to hide the output at least?

menyisskov commented 1 year ago

We're having the same issue. We run the app on k8s (docker desktop), and the all-in-one on the same laptop with the docker run command.

Any ideas what can be causing it?

chansonzhang commented 9 months ago

I run a jaeger-all-in-one.exe binary on Windows, and export span from an instrumented Sanic app, failed with error "Failed to export batch. Status code: StatusCode.UNAVAILABLE"

kevarr commented 3 months ago

The solution (using python-opentelemtry) for me was to fix my OTLPSpanExporter import. I was attempting to export gRPC spans, but was importing with:

from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

Instead I needed to import:


from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

If you're exporting using http/protobuf import from opentelemetry.exporter.otlp.proto.http.trace_exporter instead.

It's a very subtle difference. I suppose I should've paid closer attention when my IDE made an import suggestion for me.