Synthetic metric (e.g.`scrape_samples_scraped`) missing HELP metadata

samjewell commented 1 year ago

What did you do?

I downloaded and ran the Prometheus binary with the default config to scrape itself.

I found that:

I could graph scrape_samples_scraped at /graph endpoint
But viewing the metrics metadata at /api/v1/metadata there was no entry for scrape_samples_scraped

I believe the same is true of:

scrape_series_added
scrape_samples_post_metric_relabeling

I'm trying to both:

Understand better what this metric is, and
Help other users of Grafana to gain the same understanding.

Within Grafana you can typically see the HELP part of the metric-metadata on hover in the metrics browser:

But for this metric, there's nothing useful:

I'm interested in this metric in particular because I think it can help diagnose problems with people setting their scrape_interval much smaller than they need (and shipping data far too frequently). For example, as described here: https://grafana.com/docs/grafana-cloud/billing-and-usage/control-prometheus-metrics-usage/changing-scrape-interval/#grafana-agent

What did you expect to see?

HELP metadata for this metric both on the /api/v1/metadata endpoint and in Grafana (on hover, in the metrics-browser).

What did you see instead? Under which circumstances?

No metadata

System information

Darwin 21.6.0 arm64

Prometheus version

prometheus, version 2.40.4 (branch: HEAD, revision: 414d31aee6586a5f29e755ae059b7d7131f1c6c8)
  build user:       root@45956a3006ca
  build date:       20221129-11:04:07
  go version:       go1.19.3
  platform:         darwin/arm64

Prometheus configuration file

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: [ "localhost:9090" ]

Alertmanager version

No response

Alertmanager configuration file

No response

Logs

No response

beorn7 commented 1 year ago

The scrape_... metrics are so-called synthetic metrics. They are not exposed by the target, but they are created by Prometheus itself during scraping based on the scrape. Another synthetic metric (and the most famous of all) is the up metric.

Metadata is still not a native concept of the internal Prometheus data model. It is only really present in the exposition format (from where it is quite superficially handled and made available via the metadata API). But since synthetic metrics are only created internally and never exposed, they never get metadata attached.

One might argue the metadata should still be created for the sake of the metadata API. To me, that would feel like patching an already quite patchy API even more. It would be much better to finally fully embrace metadata in the internal data model (as it is beginning right now, out of necessity, with native histograms).

roidelapluie commented 1 year ago

To get back to your original concern, you can estimate the scrape interval by counting the number of times the up metrics were inserted over the last hour with the following query:

count_over_time(up[1h])

This query returns the number of times the targets were scraped over the last hour.

09jvilla commented 1 year ago

I get that this is 'not as easy as it looks', but can someone confirm the metric type of scrape_samples_scraped? Am I correct that its a gauge and not a counter?

I'm guessing its a gauge based on this blog post, which says

Ingestion rate can be calculated from scrape_samples_scraped metric exposed by Prometheus using the following PromQL query:

sum_over_time(scrape_samples_scraped[5m]) / 300

roidelapluie commented 1 year ago

Yes it is a gauge

09jvilla commented 1 year ago

Thanks!

alexgreenbank commented 1 month ago

Hello from the bug scrub.

Would this be something that is covered by the proposal to add metadata as a first class citizen in Prometheus? Should we just add metadata for these synthetic metrics or does it need to be discussed as part of the wider Metadata proposals?

beorn7 commented 1 month ago

I guess there are three parts to the question:

Do we want synthetic metrics to have metadata?
Do we want to expose that metadata via the current (and hopefully soon legacy) metadata endpoints?
Do we want to tackle this issue before we have a 1st class concept for metadata?

Obviously, (1) is what this issue is asking for, and if we don't want metadata for synthetic metrics, we can close this issue right now. However, once metadata is a 1st class citizen, it would be weird if synthetic metrics don't have any. (This is in the same vein as recording rules not having help and type and unit etc. right now, but once we have proper metadata support, we would start to miss those things for recording rules soon.) As (3) states, maybe we don't want to tackle this issue before we have that 1st class metadata concept (and know how it looks like). With 1st class metadata, I would assume that metadata is returned via the normal query APIs and not necessarily via the current metadata APIs, which are a bit wonky and mostly introduced on the fly to have at least some metadata support based on the data cached by the scraper (but without TSDB involvement). Which leads to (2): Once we have metadata for synthetic metrics, should we shoehorn that data into the legacy metadata APIs? If the answer is yes, we could start with it right now, but if the answer is no, we really have to wait for 1st class metadata.

beorn7 commented 3 weeks ago

Related to #11969 (which is about metadata for recording rules).

prometheus / prometheus