Open glogiotatidis opened 4 years ago
@duallain How can we connect the Runners to Prom?
I understand we need the following:
Can you advise on how to automate the the second and third points?
1. I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations
Prom isn't a first class object from the POV of networking, it's 'just' part of the k8s cluster. I think we should make a 'k8s-accessor' security group in each aws region, that we expect to be attached to the clusters (both existing and eks). The gitlab runners could then authorize access with something like: an inbound rule, allow to port 9252 from k8s-accessor sg.
3a. For automatic discovery, we can use the ec2 sd. The prom pods will need ec2 access to do list/get info for the instances, but that should be no problem (we likely just give the k8s nodes those perms).
The pattern I've used in the past is to attach some special tags to the ec2 instance. Something like prom_scrape:true
prom_port:9252
prom_path:/metrics
sometimes with numbers in there somewhere to allow an instance to be scraped multiple times (prom1_scrape
). It's possible to have arbitrary number of scrape points, but it doesn't really seem worth the effort in my opinion (especially since node-exporter has a text collector, so if you had many things emitting metrics it could be used as an intermediate collector). So instead, we copy a single scrape job and just increment a few (3?) times to allow multiple endpoints on one instance to be scraped.
Example config, using the labels from above as a starting point.
- job_name: 'node'
ec2_sd_configs:
- refresh_interval: 120s
relabel_configs:
# Only scrape instances with prom1_scrape tag
- source_labels: [__meta_ec2_tag_prom1_scrape]
regex: true
action: keep
# not at all tested, but goal is to use port tag + private ip to set what prom will scrape
- source_labels: [__meta_ec2_private_ip,__meta_ec2_tag_prom1_port]
regex: '(.*);(.*)' # This is the default value.
target_label: __address__
replacement: '${1}:${2}'
# Also not tested, but we're setting the magic __metrics_path_ to the value of the ec2 tag
- source_labels: [__meta_ec2_tag_prom1_path]
regex: '(.*)' # This is the default value.
target_label: __metrics_path__
replacement: '${1}'
3b. Then, we need to feed that to the prom deployment. As an example from the prom_sauron deployment: https://github.com/mozmeao/infra-services/blob/master/prom_sauron/helm/helm_configs/server.yml#L4 The tl;dr is, make that config above, add it to a helm file like ^, then wire the yml file to the bash file that deploys prom. Hopefully traceable if you look at server.yml references in prom_sauron).
Full reference: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config Very simple 'keep' label: https://www.robustperception.io/automatically-monitoring-ec2-instances Good example of ec2 magic labels: https://www.robustperception.io/controlling-the-instance-label
I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations
A bit further down this page there's another entry for GitLab Runner exporter which uses port 9252.
I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations
A bit further down this page there's another entry for GitLab Runner exporter which uses port 9252.
Ahh, classic multiple things with with the same name tripping me up. Glad you saw the list and used it.
We could consider installing the node-exporter and exposing it as well. Would give us metrics on diskspace and other node related items.
Good point on node-exporter. Will PR
(my other half says to just create a GitLab scheduled job to curl a DeadMansSnitch to make sure that everything works on the runners :)
We could consider installing the node-exporter and exposing it as well. Would give us metrics on diskspace and other node related items.
https://docs.gitlab.com/runner/monitoring/