[Feature] Add metrics endpoints to the various KAT components

Please add one or more of the following labels to your issue: backend metrics

Is your feature request related to a problem? Please describe. Atm it is quite hard to know what the KAT stack does and how it performs doing this. This information is needed to predict the amount of resources that are needed to run KAT at scale, and to add functional alerting and monitoring so that engineers can pickup issues with KAT.

Describe the solution you'd like Preferrably we get a http(s) endpoint we can query which contains metrics about the KAT components in prometheus format (https://prometheus.io/docs/practices/instrumentation/). We are looking for metrics about as much as possible, including (but not limited to) the runtime of the various operations of boefjes, the scheduler and their respective resource usages (cpu, memory, network). We would also like to see statistical information (how many scans have been performed, how many concurrent scans are running, how many users are using the system, etc).

Furthermore, this metrics endpoint can also be used as a health check for (eg) loadbalancers.

Describe alternatives you've considered With our setup (based on Hashicorp Nomad), we are only able to extract cpu and memory resources for jobs if we write our own exporters. This info is nice, but it does not give us enough insight to perform trend analysis and predict usage. Another option would be to run a sidecar container which inspects the different parts of KAT, but we would need to reverse-engineer KAT to retrieve functional metrics. The most sane and durable solution is to embed this info into the KAT processes themselves, since all the relevant information is known to the KAT processes.

Additional context The following links help with setting up an exporter https://prometheus.io/docs/instrumenting/writing_exporters/ https://github.com/prometheus/client_python

minvws / nl-kat-coordination

[Feature] Add metrics endpoints to the various KAT components #448