open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.36k stars 1.44k forks source link

Monitoring docs seem outdated #5038

Closed gfonseca-tc closed 2 years ago

gfonseca-tc commented 2 years ago

Describe the bug I'm trying to configure some monitor around our collector gateway but I can't see some of the metrics listed in this docs. I've configured the collector to scrape it own metrics and can see a list of them in my backend, but I can't see otelcol_processor_dropped_spans for instance. Not sure if the docs are outdated or if there is any configuration missing.

Steps to reproduce Configure the collector to scrape it own metrics using a prometheus receiver, like this docs and send them to your favorite backend.

What did you expect to see? I expect to see a list of the metrics exposed by the collector and an explanation on how to use them.

What did you see instead? This recommendations show some metrics but not all the metrics are being sent to my backend. I've also checked sending the metrics to a log exporter and they are not being sent.

What version did you use? Version: 0.46.0

What config did you use? Config:

receivers:
  prometheus/collector:
    config:
      scrape_configs:
        - job_name: 'o11y-collector-metrics'
          static_configs:
            - targets: ['localhost:8888']
processors:
  batch:
    timeout: 1s
    send_batch_size: 5000
    send_batch_max_size: 5000
exporters:
  logging:
    logLevel: debug
service:
  pipelines:
    metrics/collector:
      receivers: [prometheus/collector]
      processors: [batch]
      exporters: [logging]
jpkrohling commented 2 years ago

I'll add this to my queue to investigate, but I might not be able to look into this for the next couple of weeks.

gfonseca-tc commented 2 years ago

Thank you very much @jpkrohling !

jpkrohling commented 2 years ago

I just realized that you are talking about a metric related to tracing (otelcol_processor_dropped_spans), but you only have metrics pipelines.

I tried the simplest scenario that came to my mind and can confirm that the mentioned metrics are indeed available when a tracing pipeline is being used:

collector config (from examples/local/otel-config.yaml)

extensions:
  memory_ballast:
    size_mib: 512
  zpages:
    endpoint: 0.0.0.0:55679

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
  memory_limiter:
    # 75% of maximum memory up to 4G
    limit_mib: 1536
    # 25% of limit up to 2G
    spike_limit_mib: 512
    check_interval: 5s

exporters:
  logging:
    logLevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [logging]

  extensions: [memory_ballast, zpages]

Then, I generated a few traces with: $ go run ./ -otlp-insecure -traces 10 and checked the metrics on http://localhost:8888/metrics:

# HELP otelcol_exporter_enqueue_failed_log_records Number of log records failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_log_records counter
otelcol_exporter_enqueue_failed_log_records{exporter="logging",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 0
# HELP otelcol_exporter_enqueue_failed_metric_points Number of metric points failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_metric_points counter
otelcol_exporter_enqueue_failed_metric_points{exporter="logging",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 0
# HELP otelcol_exporter_enqueue_failed_spans Number of spans failed to be added to the sending queue.
# TYPE otelcol_exporter_enqueue_failed_spans counter
otelcol_exporter_enqueue_failed_spans{exporter="logging",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 0
# HELP otelcol_exporter_sent_spans Number of spans successfully sent to destination.
# TYPE otelcol_exporter_sent_spans counter
otelcol_exporter_sent_spans{exporter="logging",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 40
# HELP otelcol_process_cpu_seconds Total CPU user and system time in seconds
# TYPE otelcol_process_cpu_seconds gauge
otelcol_process_cpu_seconds{service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 0.37
# HELP otelcol_process_memory_rss Total physical memory (resident set size)
# TYPE otelcol_process_memory_rss gauge
otelcol_process_memory_rss{service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 3.6962304e+07
# HELP otelcol_process_runtime_heap_alloc_bytes Bytes of allocated heap objects (see 'go doc runtime.MemStats.HeapAlloc')
# TYPE otelcol_process_runtime_heap_alloc_bytes gauge
otelcol_process_runtime_heap_alloc_bytes{service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 5.397992e+06
# HELP otelcol_process_runtime_total_alloc_bytes Cumulative bytes allocated for heap objects (see 'go doc runtime.MemStats.TotalAlloc')
# TYPE otelcol_process_runtime_total_alloc_bytes gauge
otelcol_process_runtime_total_alloc_bytes{service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 5.4662768e+08
# HELP otelcol_process_runtime_total_sys_memory_bytes Total bytes of memory obtained from the OS (see 'go doc runtime.MemStats.Sys')
# TYPE otelcol_process_runtime_total_sys_memory_bytes gauge
otelcol_process_runtime_total_sys_memory_bytes{service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 5.73287312e+08
# HELP otelcol_process_uptime Uptime of the process
# TYPE otelcol_process_uptime counter
otelcol_process_uptime{service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 290.013392654
# HELP otelcol_processor_accepted_spans Number of spans successfully pushed into the next component in the pipeline.
# TYPE otelcol_processor_accepted_spans counter
otelcol_processor_accepted_spans{processor="memory_limiter",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 40
# HELP otelcol_processor_batch_batch_send_size Number of units in the batch
# TYPE otelcol_processor_batch_batch_send_size histogram
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="10"} 0
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="25"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="50"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="75"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="100"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="250"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="500"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="750"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="1000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="2000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="3000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="4000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="5000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="6000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="7000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="8000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="9000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="10000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="20000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="30000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="50000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="100000"} 2
otelcol_processor_batch_batch_send_size_bucket{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",le="+Inf"} 2
otelcol_processor_batch_batch_send_size_sum{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 40
otelcol_processor_batch_batch_send_size_count{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 2
# HELP otelcol_processor_batch_timeout_trigger_send Number of times the batch was sent due to a timeout trigger
# TYPE otelcol_processor_batch_timeout_trigger_send counter
otelcol_processor_batch_timeout_trigger_send{processor="batch",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 2
# HELP otelcol_processor_dropped_spans Number of spans that were dropped.
# TYPE otelcol_processor_dropped_spans counter
otelcol_processor_dropped_spans{processor="memory_limiter",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 0
# HELP otelcol_processor_refused_spans Number of spans that were rejected by the next component in the pipeline.
# TYPE otelcol_processor_refused_spans counter
otelcol_processor_refused_spans{processor="memory_limiter",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 0
# HELP otelcol_receiver_accepted_spans Number of spans successfully pushed into the pipeline.
# TYPE otelcol_receiver_accepted_spans counter
otelcol_receiver_accepted_spans{receiver="otlp",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",transport="grpc"} 40
# HELP otelcol_receiver_refused_spans Number of spans that could not be pushed into the pipeline.
# TYPE otelcol_receiver_refused_spans counter
otelcol_receiver_refused_spans{receiver="otlp",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20",transport="grpc"} 0
jpkrohling commented 2 years ago

I'm closing this, but feel free to reopen if further clarification is needed.

gfonseca-tc commented 2 years ago

Hey @jpkrohling , sorry, I missed your answer! The example I posted was just a working example. The real file really has traces pipelines.

jpkrohling commented 2 years ago

Please, post a reproducer here and I'll reopen this issue. The reproducer would be a configuration file that demonstrates the issue, plus a client sending traces. If you can use tracegen for that, even better.

Based on my previous test, I have reasons to believe that this is working, but if you give me a way to reproduce the problem, I'll gladly work on this.

gfonseca-tc commented 2 years ago

I will create a better example @jpkrohling , sorry for that! Just to confirm what is the expected behavior: if I have a trace pipeline with traces going through I should be able to see the metric being emitted even if it is 0, right?

jpkrohling commented 2 years ago

From what I remember, metrics show up only after the first time they are reported: if you reported a gauge as 0, it will show up. If you never recorded a value for a given metric, it won't show up.

tqi-raurora commented 7 months ago

I know this is a pretty old issue, but I was looking at this.

From my testing, it seems the metrics otelcol_processor_dropped_spans and otelcol_processor_dropped_metric_points are just recorded for some components, for instance the memory_limiter processor.

@gfonseca-tc example does not use memory limiter, while on @jpkrohling example, he was using the memory_limiter, I believe that's why the metric was available:

otelcol_processor_dropped_spans{processor="memory_limiter",service_instance_id="6e787b32-34f2-4dd1-87e3-7e6ca8ef8b9f",service_version="v0.47.0-36-gcb868e20"} 0