prometheus / prometheus

The Prometheus monitoring system and time series database.
https://prometheus.io/
Apache License 2.0
55.39k stars 9.11k forks source link

Synthetic metric (e.g.`scrape_samples_scraped`) missing HELP metadata #11884

Open samjewell opened 1 year ago

samjewell commented 1 year ago

What did you do?

I downloaded and ran the Prometheus binary with the default config to scrape itself.

I found that:

I believe the same is true of:

I'm trying to both:

Within Grafana you can typically see the HELP part of the metric-metadata on hover in the metrics browser: image

But for this metric, there's nothing useful: image

I'm interested in this metric in particular because I think it can help diagnose problems with people setting their scrape_interval much smaller than they need (and shipping data far too frequently). For example, as described here: https://grafana.com/docs/grafana-cloud/billing-and-usage/control-prometheus-metrics-usage/changing-scrape-interval/#grafana-agent

What did you expect to see?

HELP metadata for this metric both on the /api/v1/metadata endpoint and in Grafana (on hover, in the metrics-browser).

What did you see instead? Under which circumstances?

No metadata

System information

Darwin 21.6.0 arm64

Prometheus version

prometheus, version 2.40.4 (branch: HEAD, revision: 414d31aee6586a5f29e755ae059b7d7131f1c6c8)
  build user:       root@45956a3006ca
  build date:       20221129-11:04:07
  go version:       go1.19.3
  platform:         darwin/arm64

Prometheus configuration file

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: [ "localhost:9090" ]

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

No response

beorn7 commented 1 year ago

The scrape_... metrics are so-called synthetic metrics. They are not exposed by the target, but they are created by Prometheus itself during scraping based on the scrape. Another synthetic metric (and the most famous of all) is the up metric.

Metadata is still not a native concept of the internal Prometheus data model. It is only really present in the exposition format (from where it is quite superficially handled and made available via the metadata API). But since synthetic metrics are only created internally and never exposed, they never get metadata attached.

One might argue the metadata should still be created for the sake of the metadata API. To me, that would feel like patching an already quite patchy API even more. It would be much better to finally fully embrace metadata in the internal data model (as it is beginning right now, out of necessity, with native histograms).

roidelapluie commented 1 year ago

To get back to your original concern, you can estimate the scrape interval by counting the number of times the up metrics were inserted over the last hour with the following query:

count_over_time(up[1h])

This query returns the number of times the targets were scraped over the last hour.

09jvilla commented 11 months ago

I get that this is 'not as easy as it looks', but can someone confirm the metric type of scrape_samples_scraped? Am I correct that its a gauge and not a counter?

I'm guessing its a gauge based on this blog post, which says

Ingestion rate can be calculated from scrape_samples_scraped metric exposed by Prometheus using the following PromQL query:

sum_over_time(scrape_samples_scraped[5m]) / 300
roidelapluie commented 11 months ago

Yes it is a gauge

09jvilla commented 11 months ago

Thanks!