How do we want to keep the metric portfolio clean over long service periods?
Example:
The metric wmo_wis2_gc_dataserver_status_flag (labels: centre_id|dataserver|report_by) has two values with assigned states for the respective data server.
Scenarios:
1) If a data server is replaced by a new one, the old metric remains until the GC is restarted. If the last download did not work, the status remains as error.
2) If a WIS2 Node is no longer in operation, the metrics for the data server would be included until the next GC restart.
3) When a WIS2 Node switches to inline data, the metric for the dataserver status is no longer updated (if inline content is ok and is used/preferred). The status is only set when other messages are received for products that do not contain inline data.
This could be prevented by regularly deleting the metric after a certain period of time (e.g. 24h/1week) or adding a time label and deletion after a certain time period without changes (e.g. 24h/1week) . In this case, metrics are not consistently available for all data servers. However, the series are also interrupted by a restart of the Global Service.
How do we want to keep the metric portfolio clean over long service periods?
Example: The metric wmo_wis2_gc_dataserver_status_flag (labels: centre_id|dataserver|report_by) has two values with assigned states for the respective data server. Scenarios: 1) If a data server is replaced by a new one, the old metric remains until the GC is restarted. If the last download did not work, the status remains as error. 2) If a WIS2 Node is no longer in operation, the metrics for the data server would be included until the next GC restart. 3) When a WIS2 Node switches to inline data, the metric for the dataserver status is no longer updated (if inline content is ok and is used/preferred). The status is only set when other messages are received for products that do not contain inline data.
This could be prevented by regularly deleting the metric after a certain period of time (e.g. 24h/1week) or adding a time label and deletion after a certain time period without changes (e.g. 24h/1week) . In this case, metrics are not consistently available for all data servers. However, the series are also interrupted by a restart of the Global Service.