Closed jszwedko closed 3 years ago
If I read the code in #9330 correctly, it is extracting just the host and port out of the endpoint url, correct?
I don't think this logic necessarily needs to be hard-coded. You could have an "endpoint" tag, and let VRL massage it if the user wants it to be more prometheus-like, using parse_url.
However, if this logic is going to be built-in, I have a suggestion: if the URL contains a fragment identifier, then use the fragment as the instance tag instead of the address:port. For example:
endpoints = [
'http://192.0.2.1:9100/metrics#foo',
'http://192.0.2.2:9100/metrics#bar',
]
This would set the instance tag to 'foo' and 'bar' respectively for those scrapes, giving meaningful instance labels.
This is the sort of thing which you can do using relabelling in prometheus, but you wouldn't be able to do if the "instance" label has already been stripped down to host and port.
Or you could provide both endpoint_tag
and instance_tag
.
EDIT: a quick test shows that the fragment is already correctly removed from the path in the HTTP request, i.e. tcpdump shows GET /metrics HTTP/1.1
Thanks for the thoughts @candlerb . I was thinking it'd be nice to match the prometheus scraper's concept of instance
specifically since Prometheus users are likely to be familiar with it already. I could see adding support for labeling endpoint as well though and then users could use VRL to extract the info they want as usual.
I think this would also address https://github.com/vectordotdev/vector/issues/6953?
I wonder if this would be enough information to infer the job label and be truly prometheus compatible. Can I have a transform that's like a map from endpoint to job name?
It looks like this issue is indeed a dupe of #6953, yes.
When you say "infer the job label": the job label is static in prometheus, fixed for all targets in the scrape job. You can do it like this in VRL:
[sources.scrape_node]
type = "prometheus_scrape"
endpoints = [
'http://192.0.2.1:9100/metrics#foo',
'http://192.0.2.2:9100/metrics#bar',
]
scrape_interval_secs = 60
[transforms.tag_node]
type = "remap"
inputs = ["scrape_node"]
source = '''
.tags.job = "node"
'''
This ticket is about the instance label, which identifies the individual scraped endpoint.
I'm proposing:
endpoint_tag = "endpoint"
then you get the whole URL as the taginstance_tag = "instance"
then you get the URL fragment, or if that doesn't exist, the address and port.Given (1) you could implement (2) yourself in VRL, using the parse_url()
function
Having one source+transform per job would work, but is cumbersome for a large/dynamic number of jobs. Typically with prometheus you'd set the job label dynamically in relabel_configs and I am wondering if it's possible to achieve something similar with VRL once you have the endpoint tag as proposed here.
Sep 26, 2021 8:44:25 AM Brian Candler @.***>:
It looks like this issue is indeed a dupe of #6953[https://github.com/vectordotdev/vector/issues/6953], yes.
When you say "infer the job label": the job label is static in prometheus, fixed for all targets in the scrape job. You can do it like this in VRL:
[sources.scrape_node] type = "prometheus_scrape" endpoints = [ 'http://192.0.2.1:9100/metrics#foo', 'http://192.0.2.2:9100/metrics#bar', ] scrape_interval_secs = 60
[transforms.tag_node] type = "remap" inputs = ["scrape_node"] source = ''' .tags.job = "node" '''
This ticket is about the instance label, which identifies the individual scraped endpoint.
I'm proposing:
If you set endpoint_tag = "endpoint" then you get the whole URL as the tag
If you set instance_tag = "instance" then you get the URL fragment, or if that doesn't exist, the address and port.
Given (1) you could implement (2) yourself in VRL, using the parse_url() function
— You are receiving this because you commented. Reply to this email directly, view it on GitHub[https://github.com/vectordotdev/vector/issues/9322#issuecomment-927326443], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABDLDWEQCX4FNMPMKKQNDD3UD45VPANCNFSM5EUGIZMA]. Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. [data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFQAAABUCAYAAAAcaxDBAAAAAXNSR0IArs4c6QAAAARzQklUCAgICHwIZIgAAAAySURBVHic7cEBDQAAAMKg909tDjegAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAeDVulAABbzDScQAAAABJRU5ErkJggg==###24x24:true###][Tracking image][https://github.com/notifications/beacon/ABDLDWG25E7K4G624O5F6MDUD45VPA5CNFSM5EUGIZMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOG5C6B2Y.gif]
With prometheus, normally you'd set the instance
label dynamically using relabelling.
For Vector: yes, if you've captured the whole endpoint URL in a tag, then using VRL to extract just the part that you want to appear in the instance
label is straightforward enough. (Or in the job
label if you want, or any other label - the logic is whatever you put in VRL)
In some cases, you want the instance label to be different to the target endpoint (e.g. you want a meaningful name in instance
but the scrape to be done to a particular IP address and port). Prometheus lets you do this by using relabelling with regexp matches to split __address__
and then create a new __address__
to be scraped.
For Vector, since the endpoint is a URL, there is the "fragment" part of the URL (the bit after #
) which can be used for that purpose, as it's not used when generating a scrape.
👍 https://github.com/vectordotdev/vector/pull/9330 should satisfy both requirements mentioned in https://github.com/vectordotdev/vector/issues/9322#issuecomment-927326443 . Namely it adds an instance_tag
and endpoint_tag
configuration.
Discussed in https://github.com/vectordotdev/vector/discussions/9321