jaegertracing / jaeger

CNCF Jaeger, a Distributed Tracing Platform
https://www.jaegertracing.io/
Apache License 2.0
20.52k stars 2.44k forks source link

[Bug]: Monitor tab has no data even though the spans appear in search tab (need help / urgent) #6199

Closed putthiwat0cha closed 5 days ago

putthiwat0cha commented 5 days ago

What happened?

I try to trace performance from my Py service but even though the spans appear normally in search tab

Image

the monitor tab show no data at all, in every kinds of spans at dropdown.

Image . . .

Steps to reproduce

This is "jaeger.yml"

services:
  jaeger:
    networks:
      - "backend"
    image: "jaegertracing/all-in-one:1.62.0"
    environment:
      - "METRICS_STORAGE_TYPE=prometheus"
      - "PROMETHEUS_SERVER_URL=http://prometheus:9090"
      - "PROMETHEUS_QUERY_SUPPORT_SPANMETRICS_CONNECTOR=true"
    ports:
      - "4317:4317"
      - "14269:14269"
      - "16686:16686"
    restart: "on-failure"
  otel:
    networks:
      - "backend"
    image: "otel/opentelemetry-collector-contrib:0.113.0"
    volumes:
      - "./otel.yml:/etc/otel-collector/otel.yml"
    command: [ "--config=/etc/otel-collector/otel.yml" ]
    depends_on:
      - "jaeger"
    ports:
      - "4318:4318"
      - "8889:8889"
    restart: "on-failure"
  microsim:
    networks:
      - "backend"
    image: "yurishkuro/microsim:v0.4.1"
    command: "-d 24h -s 500ms"
    environment:
      - "OTEL_EXPORTER_OTLP_ENDPOINT=http://otel:4318"
      - "OTEL_EXPORTER_OTLP_INSECURE=true"
    depends_on:
      - "otel"
    restart: "on-failure"
  prometheus:
    networks:
      - "backend"
    image: "prom/prometheus:v2.55.1"
    volumes:
      - "./prometheus.yml:/etc/prometheus/prometheus.yml"
    command: [ "--config.file=/etc/prometheus/prometheus.yml" ]
    depends_on:
      - "jaeger"
    ports:
      - "9090:9090"
    restart: "on-failure"
networks:
  backend:
    driver: "bridge"

. . .

This is "otel.yml"

receivers:
  otlp:
    protocols:
      grpc:
      http:
        endpoint: "0.0.0.0:4318"
exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
  otlp:
    endpoint: "http://jaeger:4317"
    tls:
      insecure: true
connectors:
  spanmetrics:
processors:
  batch:
service:
  pipelines:
    traces:
      receivers: [ "otlp" ]
      processors: [ "batch" ]
      exporters: [ "spanmetrics" ]
    metrics/spanmetrics:
      receivers: [ "spanmetrics" ]
      processors: [ "batch" ]
      exporters: [ "prometheus" ]

. . .

This is "prometheus.yml"

global:
  scrape_interval: "15s"
  evaluation_interval: "15s"
scrape_configs:
  - job_name: "tracing"
    metrics_path: "/metrics"
    static_configs:
    - targets: [ "otel:8889" ]
  - job_name: "all"
    metrics_path: "/metrics"
    static_configs:
    - targets: [ "jaeger:14269" ]

. . .

This is command to build Jaeger to Docker Container

docker-compose -f "D:/Jaeger/jaeger.yml" up -d

Image . . .

This is state of Prometheus target

Image

Image . . .

This is "tracing.py"

import time
import http.client
import json
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.metrics import get_meter
trace.set_tracer_provider(TracerProvider(resource=Resource.create({SERVICE_NAME: "test-service"})))
otlp_exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(otlp_exporter))
meter = get_meter(__name__)
request_counter = meter.create_counter("http_requests", unit="1", description="Counts number of HTTP requests")
request_duration = meter.create_histogram("http_request_duration_seconds", unit="seconds", description="Measures the duration of HTTP requests")
tracer = trace.get_tracer(__name__)
url = "api.restful-api.dev"
endpoint = "/objects"
headers = {"Accept":"*/*", "Connection":"keep-alive", "Content-Type":"application/json"}
body = {"name":"Test Object", "data": {"year":2024, "price":1.00, "CPU model":"CoreX", "Hard disk size":"1 TB"}}
while True:
    with tracer.start_as_current_span("api_call_span", kind=trace.SpanKind.CLIENT) as root_span:
        try:
            start_time = time.time()
            with tracer.start_as_current_span("http_post_span", kind=trace.SpanKind.CLIENT):
                connection = http.client.HTTPSConnection(url, timeout=10)
                connection.request("POST", endpoint, body=json.dumps(body), headers=headers)
                response = connection.getresponse()
                response_body = response.read().decode("utf-8")
                elapsed_time = time.time() - start_time
                root_span.set_attribute("http.status_code", response.status)
                root_span.set_attribute("http.method", "POST")
                root_span.set_attribute("http.url", f"https://{url}{endpoint}")
                request_counter.add(1)
                request_duration.record(elapsed_time)
                print("Response Status:", response.status)
                print("Response Body:", response_body)    
        except Exception as e:
            root_span.record_exception(e)
            root_span.set_status(trace.Status(trace.StatusCode.ERROR, str(e)))
            print("An error occurred:", str(e))
        finally:
            connection.close()
    time.sleep(30)

. . .

This is command to run Py service script

python D:/Jaeger/test_script/tracing.py

Image . . .

Expected behavior

If spans from Py service already appear in search tab, the monitor tab should show data as well.

yurishkuro commented 5 days ago

we have extensive troubleshooting section https://www.jaegertracing.io/docs/1.63/spm/#troubleshooting

putthiwat0cha commented 5 days ago

we have extensive troubleshooting section https://www.jaegertracing.io/docs/1.63/spm/#troubleshooting

@yurishkuro

I already review document before building these scripts but it does not help me resolve this issue at all. Also tried to modified the scripts multiple times for whole weeks as well but still have no luck.

May you please test what I provided in "Steps to reproduce" then explain what I did wrong? I have no idea how to fix it anymore...

yurishkuro commented 5 days ago

Please post the output of the troubleshooting steps

putthiwat0cha commented 5 days ago

Please post the output of the troubleshooting steps

@yurishkuro

Do you mean this one?

Image

Image

Image

I think I have provided enough information already. Now, may I ask, could you please help me check what is wrong with my scripts?

I am not that expert so I have no idea how to resolve this issue if your explanation has too little detail...

yurishkuro commented 5 days ago

Your screenshots are not showing any requests to the APIs for retrieving SPM data.

But I was asking about metrics in Prometheus: https://www.jaegertracing.io/docs/1.63/spm/#query-prometheus

putthiwat0cha commented 5 days ago

Your screenshots are not showing any requests to the APIs for retrieving SPM data.

But I was asking about metrics in Prometheus: https://www.jaegertracing.io/docs/1.63/spm/#query-prometheus

@yurishkuro

  • duration_bucket
  • duration_milliseconds_bucket
  • duration_seconds_bucket
  • calls
  • calls_total

All of them return nothing, in both table and graph tab.

Image

I think it will be more efficient to resolve this issue by letting you try building my script:

https://drive.google.com/file/d/1zpLS9ecTyAQwrknIRCSsDlL3ZBqowykB/view?usp=sharing

it use only 4 commands to build

docker login --username=DOCKER_ACCOUNT_USERNAME docker-compose -f "D:/Jaeger/jaeger.yml" up -d pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp python D:/Jaeger/test_script/tracing.py

yurishkuro commented 5 days ago

All of them return nothing, in both table and graph tab.

that means the metrics are not making it from otel collector to prometheus