medic / cht-watchdog

Configuration for deploying a monitoring/alerting stack for CHT
GNU Affero General Public License v3.0
4 stars 7 forks source link

Ensure API Express metrics are using correct protocol (http or https) #128

Closed mrjones-plip closed 5 hours ago

mrjones-plip commented 2 days ago

A production instance recently had port 80 accidentally stop working. As it turns out, API Express metrics always defaults to 80 and then gets redirected to port 443 where the metrics actually are served. Without the redirect, the metrics fail.

Related:

Checking production watchdog instance that Medic runs, I see all requests are hitting port 80 first:

Image

mrjones-plip commented 2 days ago

tl;dr - i propose we force scrapes of express metrics to be over TLS. I've added a PR to this extent.


Soooo - this is tricky. On a local docker helper instance I had both port 10091 running with http and port 10454 running https. By default express metrics runs a regex and drops the https (or http) used in cht-instances.yml - check out the regex line:

  - job_name: cht-express-metrics
    metrics_path: /api/v1/express-metrics
    file_sd_configs:
      - files:
        - '/etc/prometheus/cht-instances.yml'
    relabel_configs:
    - source_labels: [__address__]
      regex: "(?:https?:\\/\\/|)(?:www\\.|)(.*?)(?:\\/|)$"
      target_label: instance
      replacement: "$1"
    - source_labels: [instance]
      target_label: __address__

So what happens in this default scenario is

  1. watchdog reads in the URL from cht-instances.yml as https://192-168-68-26.local-ip.medicmobile.org:10454
  2. then regex above drops the https and makes this URL: http://192-168-68-26.local-ip.medicmobile.org:10454/api/v1/express-metrics
  3. it does a scrape and of course errors out: server returned HTTP status 497 - translation - bruh - i can't speak http to an https server

The fix is one of these then:

  1. either we update the above scrape config file and add a scheme: "https" - which forces ALL requests to go to https - OR `
  2. um...there is second one now that I think about it! I was going to say you can put in http://192-168-68-26.local-ip.medicmobile.org:10091 in the cht-instances.yml file - but then in dev I had to comment out the redirect we have on the http port add and add the server.conf include and then it could get both the JSON monitoring endpoint AND the express metrics. but this is so not real world that it's useless

by going with option 1 (scheme: "https" ) we're breaking it in dev environment for not docker helper, but it will work in docker helper, which seems common enough.

i'll see where I get with the PR!

mrjones-plip commented 2 days ago

cc @jkuester - but I'm tapping Kenn for the PR review!