wmo-im / wis2-metric-hierarchy

Apache License 2.0
0 stars 2 forks source link

Lifecycle of metrics that cannot be initialised by setting to 0 #20

Open antje-s opened 3 months ago

antje-s commented 3 months ago

How do we want to keep the metric portfolio clean over long service periods?

Example: The metric wmo_wis2_gc_dataserver_status_flag (labels: centre_id|dataserver|report_by) has two values with assigned states for the respective data server. Scenarios: 1) If a data server is replaced by a new one, the old metric remains until the GC is restarted. If the last download did not work, the status remains as error. 2) If a WIS2 Node is no longer in operation, the metrics for the data server would be included until the next GC restart. 3) When a WIS2 Node switches to inline data, the metric for the dataserver status is no longer updated (if inline content is ok and is used/preferred). The status is only set when other messages are received for products that do not contain inline data.

This could be prevented by regularly deleting the metric after a certain period of time (e.g. 24h/1week) or adding a time label and deletion after a certain time period without changes (e.g. 24h/1week) . In this case, metrics are not consistently available for all data servers. However, the series are also interrupted by a restart of the Global Service.