mozmeao / infra

Mozilla Marketing Engineering and Operations Infrastructure
https://mozilla.github.io/meao/
Mozilla Public License 2.0
59 stars 12 forks source link

Connect gitlab runners to prometheus #1264

Open glogiotatidis opened 4 years ago

glogiotatidis commented 4 years ago

https://docs.gitlab.com/runner/monitoring/

glogiotatidis commented 4 years ago

@duallain How can we connect the Runners to Prom?

I understand we need the following:

Can you advise on how to automate the the second and third points?

duallain commented 4 years ago

1. I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

  1. Prom isn't a first class object from the POV of networking, it's 'just' part of the k8s cluster. I think we should make a 'k8s-accessor' security group in each aws region, that we expect to be attached to the clusters (both existing and eks). The gitlab runners could then authorize access with something like: an inbound rule, allow to port 9252 from k8s-accessor sg.

3a. For automatic discovery, we can use the ec2 sd. The prom pods will need ec2 access to do list/get info for the instances, but that should be no problem (we likely just give the k8s nodes those perms).

The pattern I've used in the past is to attach some special tags to the ec2 instance. Something like prom_scrape:true prom_port:9252 prom_path:/metrics sometimes with numbers in there somewhere to allow an instance to be scraped multiple times (prom1_scrape). It's possible to have arbitrary number of scrape points, but it doesn't really seem worth the effort in my opinion (especially since node-exporter has a text collector, so if you had many things emitting metrics it could be used as an intermediate collector). So instead, we copy a single scrape job and just increment a few (3?) times to allow multiple endpoints on one instance to be scraped.

Example config, using the labels from above as a starting point.

  - job_name: 'node'
    ec2_sd_configs:
      - refresh_interval: 120s
    relabel_configs:
        # Only scrape instances with prom1_scrape tag
      - source_labels: [__meta_ec2_tag_prom1_scrape]
        regex: true
        action: keep
      # not at all tested, but goal is to use port tag + private ip to set what prom will scrape
      - source_labels: [__meta_ec2_private_ip,__meta_ec2_tag_prom1_port]
         regex:  '(.*);(.*)'             # This is the default value.
         target_label: __address__
         replacement: '${1}:${2}'
     # Also not tested, but we're setting the magic __metrics_path_ to the value of the ec2 tag 
     - source_labels: [__meta_ec2_tag_prom1_path]
         regex:  '(.*)'             # This is the default value.
         target_label: __metrics_path__
         replacement: '${1}'

3b. Then, we need to feed that to the prom deployment. As an example from the prom_sauron deployment: https://github.com/mozmeao/infra-services/blob/master/prom_sauron/helm/helm_configs/server.yml#L4 The tl;dr is, make that config above, add it to a helm file like ^, then wire the yml file to the bash file that deploys prom. Hopefully traceable if you look at server.yml references in prom_sauron).

Full reference: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#ec2_sd_config Very simple 'keep' label: https://www.robustperception.io/automatically-monitoring-ec2-instances Good example of ec2 magic labels: https://www.robustperception.io/controlling-the-instance-label

glogiotatidis commented 4 years ago

I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

A bit further down this page there's another entry for GitLab Runner exporter which uses port 9252.

duallain commented 4 years ago

I found this article with default ports, it looks like gitlab-exporter's default is normally 9168. (It's sort of arbitrary from our perspective, but if we want to try to avoid a collision later, maybe we can just ride on their coattails), https://github.com/prometheus/prometheus/wiki/Default-port-allocations

A bit further down this page there's another entry for GitLab Runner exporter which uses port 9252.

Ahh, classic multiple things with with the same name tripping me up. Glad you saw the list and used it.

duallain commented 4 years ago

We could consider installing the node-exporter and exposing it as well. Would give us metrics on diskspace and other node related items.

glogiotatidis commented 4 years ago

Good point on node-exporter. Will PR

(my other half says to just create a GitLab scheduled job to curl a DeadMansSnitch to make sure that everything works on the runners :)

glogiotatidis commented 4 years ago

We could consider installing the node-exporter and exposing it as well. Would give us metrics on diskspace and other node related items.

https://github.com/mozmeao/infra-services/pull/52