Closed skitt closed 3 years ago
Basic metric support is in place in the operator (https://github.com/submariner-io/submariner-operator/pull/549) and in the engine (https://github.com/submariner-io/submariner/pull/843). To make all this usable and maintainable, we need to:
[x] add metric tests outside the operator (see https://github.com/submariner-io/submariner-operator/pull/627 for an example);
[x] (skitt) ensure all the metrics we provide are automatically configured as far as possible, for integration into a running Prometheus instance (OpenShift’s or any other) — https://github.com/submariner-io/submariner-operator/issues/905;
[ ] (maayanf24) add relevant internal metrics, useful to track the health of the containers we use (see https://github.com/submariner-io/submariner/pull/845 for an example);
[x] (maayanf24) add relevant operational metrics, as discussed in https://github.com/submariner-io/submariner/issues/678 (see https://github.com/submariner-io/submariner/pull/846 for a Libreswan example);
[x] (skitt) check the CoreDNS metrics: https://github.com/submariner-io/lighthouse/issues/276.
[x] (maayanf24, P2) Active Gateway for the cluster and its uptime — the operator should already track the gateway, we need to add the uptime: https://github.com/submariner-io/submariner-operator/pull/994
[x] (P3) How many clusters are under control
[x] (maayanf24, p4) Globalnet https://github.com/submariner-io/submariner/issues/890
[ ] (p5) Lighthouse https://github.com/submariner-io/lighthouse/issues/349
[ ] (p6) Broker https://github.com/submariner-io/admiral/issues/136 For the operational metrics, we need some design work; the main questions to answer are:
Should metrics be part of an internal API, or should we instead agree on metric names and types (including labels etc.), and leave the implementation to each metric provider?
We’re providing a shared API: see https://github.com/submariner-io/submariner/pull/923
What metrics do we want, and how should they appear?
See https://github.com/submariner-io/submariner/issues/678 for a list. See https://github.com/submariner-io/submariner/pull/899 for a number of these; support needs to be added to the shared API.
What log messages should we turn into metrics so that they can easily be turned into alerts?
See also https://github.com/submariner-io/submariner-operator/issues/532.
@maayanf24 can you please close this epic and track individual items we want to see improved?
Basic metric support is in place in the operator (https://github.com/submariner-io/submariner-operator/pull/549) and in the engine (https://github.com/submariner-io/submariner/pull/843). To make all this usable and maintainable, we need to:
[x] add metric tests outside the operator (see https://github.com/submariner-io/submariner-operator/pull/627 for an example);
[x] (skitt) ensure all the metrics we provide are automatically configured as far as possible, for integration into a running Prometheus instance (OpenShift’s or any other) — https://github.com/submariner-io/submariner-operator/issues/905;
[ ] (maayanf24) add relevant internal metrics, useful to track the health of the containers we use (see https://github.com/submariner-io/submariner/pull/845 for an example);
[x] (maayanf24) add relevant operational metrics, as discussed in https://github.com/submariner-io/submariner/issues/678 (see https://github.com/submariner-io/submariner/pull/846 for a Libreswan example);
[x] (skitt) check the CoreDNS metrics: https://github.com/submariner-io/lighthouse/issues/276.
[x] (maayanf24, P2) Active Gateway for the cluster and its uptime — the operator should already track the gateway, we need to add the uptime: https://github.com/submariner-io/submariner-operator/pull/994
[x] (P3) How many clusters are under control
[x] (maayanf24, p4) Globalnet https://github.com/submariner-io/submariner/issues/890
[ ] (p5) Lighthouse https://github.com/submariner-io/lighthouse/issues/349
[ ] (p6) Broker https://github.com/submariner-io/admiral/issues/136 For the operational metrics, we need some design work; the main questions to answer are:
Should metrics be part of an internal API, or should we instead agree on metric names and types (including labels etc.), and leave the implementation to each metric provider?
We’re providing a shared API: see https://github.com/submariner-io/submariner/pull/923
What metrics do we want, and how should they appear?
See https://github.com/submariner-io/submariner/issues/678 for a list. See https://github.com/submariner-io/submariner/pull/899 for a number of these; support needs to be added to the shared API.
What log messages should we turn into metrics so that they can easily be turned into alerts?
See also https://github.com/submariner-io/submariner-operator/issues/532.