vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.59k stars 1.55k forks source link

Adding an "instance" tag for the prometheus scraper #9322

Closed jszwedko closed 3 years ago

jszwedko commented 3 years ago

Discussed in https://github.com/vectordotdev/vector/discussions/9321

Originally posted by **candlerb** September 23, 2021 I've been playing with the prometheus scraper, and it's pretty awesome: ``` [sources.node_exporter] type = "prometheus_scrape" endpoints = [ "http://host1:9100/metrics", "http://host2:9100/metrics", ] scrape_interval_secs = 60 # Print scraped metrics to stdout [sinks.print] type = "console" inputs = ["node_exporter"] encoding.codec = "json" ``` Scraping works instantly. The problem is that I can't see how to distinguish the metrics originating from different hosts. e.g. ``` $ vector --config vector-scrape.toml | grep node_memory_Active_bytes {"name":"node_memory_Active_bytes","timestamp":"2021-09-23T17:14:30.775368953Z","kind":"absolute","gauge":{"value":1648369664.0}} {"name":"node_memory_Active_bytes","timestamp":"2021-09-23T17:14:30.817781032Z","kind":"absolute","gauge":{"value":305164288.0}} ``` Which host does each metric belong to?? With a "real" prometheus server, it would add a label `instance="..."`, which defaults to the `__address__` which was scraped. What's the standard way in Vector of distinguish the same metric from multiple sources: a tag? A namespace? Does that association already exist for prometheus scrapes, but is hidden in the "console" sink output? I can see that [internal metrics](https://vector.dev/docs/reference/configuration/sources/internal_metrics/) have `tags.host_key` and `tags.pid_key`, but I don't see anything equivalent for prometheus scraping. I'm only just getting to grips with the Vector data model, so I apologise if I've missed something obvious.
candlerb commented 3 years ago

If I read the code in #9330 correctly, it is extracting just the host and port out of the endpoint url, correct?

I don't think this logic necessarily needs to be hard-coded. You could have an "endpoint" tag, and let VRL massage it if the user wants it to be more prometheus-like, using parse_url.

However, if this logic is going to be built-in, I have a suggestion: if the URL contains a fragment identifier, then use the fragment as the instance tag instead of the address:port. For example:

endpoints = [
  'http://192.0.2.1:9100/metrics#foo',
  'http://192.0.2.2:9100/metrics#bar',
]

This would set the instance tag to 'foo' and 'bar' respectively for those scrapes, giving meaningful instance labels.

This is the sort of thing which you can do using relabelling in prometheus, but you wouldn't be able to do if the "instance" label has already been stripped down to host and port.

Or you could provide both endpoint_tag and instance_tag.

EDIT: a quick test shows that the fragment is already correctly removed from the path in the HTTP request, i.e. tcpdump shows GET /metrics HTTP/1.1

jszwedko commented 3 years ago

Thanks for the thoughts @candlerb . I was thinking it'd be nice to match the prometheus scraper's concept of instance specifically since Prometheus users are likely to be familiar with it already. I could see adding support for labeling endpoint as well though and then users could use VRL to extract the info they want as usual.

nivekuil commented 3 years ago

I think this would also address https://github.com/vectordotdev/vector/issues/6953?

I wonder if this would be enough information to infer the job label and be truly prometheus compatible. Can I have a transform that's like a map from endpoint to job name?

candlerb commented 3 years ago

It looks like this issue is indeed a dupe of #6953, yes.

When you say "infer the job label": the job label is static in prometheus, fixed for all targets in the scrape job. You can do it like this in VRL:

[sources.scrape_node]
type = "prometheus_scrape"
endpoints = [
  'http://192.0.2.1:9100/metrics#foo',
  'http://192.0.2.2:9100/metrics#bar',
]
scrape_interval_secs = 60

[transforms.tag_node]
type = "remap"
inputs = ["scrape_node"]
source = '''
.tags.job = "node"
'''

This ticket is about the instance label, which identifies the individual scraped endpoint.

I'm proposing:

  1. If you set endpoint_tag = "endpoint" then you get the whole URL as the tag
  2. If you set instance_tag = "instance" then you get the URL fragment, or if that doesn't exist, the address and port.

Given (1) you could implement (2) yourself in VRL, using the parse_url() function

nivekuil commented 3 years ago

Having one source+transform per job would work, but is cumbersome for a large/dynamic number of jobs. Typically with prometheus you'd set the job label dynamically in relabel_configs and I am wondering if it's possible to achieve something similar with VRL once you have the endpoint tag as proposed here.

Sep 26, 2021 8:44:25 AM Brian Candler @.***>:

It looks like this issue is indeed a dupe of #6953[https://github.com/vectordotdev/vector/issues/6953], yes.

When you say "infer the job label": the job label is static in prometheus, fixed for all targets in the scrape job. You can do it like this in VRL:

[sources.scrape_node] type = "prometheus_scrape" endpoints = [   'http://192.0.2.1:9100/metrics#foo',   'http://192.0.2.2:9100/metrics#bar', ] scrape_interval_secs = 60

[transforms.tag_node] type = "remap" inputs = ["scrape_node"] source = ''' .tags.job = "node" '''

This ticket is about the instance label, which identifies the individual scraped endpoint.

I'm proposing:

  1. If you set endpoint_tag = "endpoint" then you get the whole URL as the tag

  2. If you set instance_tag = "instance" then you get the URL fragment, or if that doesn't exist, the address and port.

Given (1) you could implement (2) yourself in VRL, using the parse_url() function

— You are receiving this because you commented. Reply to this email directly, view it on GitHub[https://github.com/vectordotdev/vector/issues/9322#issuecomment-927326443], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABDLDWEQCX4FNMPMKKQNDD3UD45VPANCNFSM5EUGIZMA]. Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. [###24x24:true###][Tracking image][https://github.com/notifications/beacon/ABDLDWG25E7K4G624O5F6MDUD45VPA5CNFSM5EUGIZMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOG5C6B2Y.gif]

candlerb commented 3 years ago

With prometheus, normally you'd set the instance label dynamically using relabelling.

For Vector: yes, if you've captured the whole endpoint URL in a tag, then using VRL to extract just the part that you want to appear in the instance label is straightforward enough. (Or in the job label if you want, or any other label - the logic is whatever you put in VRL)

In some cases, you want the instance label to be different to the target endpoint (e.g. you want a meaningful name in instance but the scrape to be done to a particular IP address and port). Prometheus lets you do this by using relabelling with regexp matches to split __address__ and then create a new __address__ to be scraped.

For Vector, since the endpoint is a URL, there is the "fragment" part of the URL (the bit after #) which can be used for that purpose, as it's not used when generating a scrape.

jszwedko commented 3 years ago

👍 https://github.com/vectordotdev/vector/pull/9330 should satisfy both requirements mentioned in https://github.com/vectordotdev/vector/issues/9322#issuecomment-927326443 . Namely it adds an instance_tag and endpoint_tag configuration.