wmo-im / wis2-guide

WIS2 Guide
https://wmo-im.github.io/wis2-guide
Apache License 2.0
1 stars 6 forks source link

How to push alerts back to originating WIS2node? #8

Open 6a6d74 opened 1 year ago

6a6d74 commented 1 year ago

See discussion about raising alerts in #7 and #6.

antje-s commented 1 year ago

If each GC can also decide not to cache data according to its own criteria, we should introduce a monitoring metric that captures the number so that sync comparison between GCs still remains possible. Should the GC re-publish no message in such a case? Because if they simply distribute the message with the original data download URL this can lead to problems (both with the download on the recipient side and for the origin WIS2 Node). Perhaps a report-notification could be sent instead of the data-notification. But in that case it has to be decided how the topic structure below reports should be, so that each centre can quickly filter relevant notifications for itself

golfvert commented 1 year ago

Global Caches shouldn't have the option not to cache data using their own criteria. For those old enough to remember the endless debate between the GISCs and beyond on the option to select what is cached or not, I think, it is something we don't want to reopen in the WIS2 context. So, by default, core data are cached. Except if the data producer decides (typically too large dataset, APIs,...) that its data shouldn't be cached.

Now, to address the issue raised here, that is where the GISCs comes into play. The idea would be to inform the GISC and the WIS2 Node via a notification message that "something" is wrong. It is likely that this "something" will require human intervention. So, GISC and WIS2 Node upon receiving the notification should act on it. Where and how those notifications are published would be the next step to discuss.

kaiwirt commented 1 year ago

@golfvert The main difference i see between caches in WIS1 and Caches in WIS2 is, that in WIS2 this is transparent. A user just follows the link. If that data is on Cache1 or Cache2 or on the producers endpoint is transparent.

So there is no need to have the same data everywhere.

I agree, that it is wanted that all caches have all core data. But you list a few exceptions (data size etc) which is not only a decision of the data producer. If a large data set can be downloaded is a decision of the cache.

So to cut a long story short: Caches should have all core data, but the design of WIS2 allows for more flexibility and there is no problem if a certain data set is missing at one cache (in contrast to WIS1).

golfvert commented 1 year ago

Yes, the design is flexible enough, so it is not an issue. I'd like to avoid someone proclaiming "Hey, I am a Global Cache" when in fact they are keeping copies there a copy of their own data only :) I would also like to avoid the endless discussion we had on the 24h in the WIS1 context. The more we have SHALL (not only on this) the less we have uncontrolled moving parts in our design. So, Global Cache SHALL keep a copy of all core data with cache: true (when properties.cache is missing, this is equivalent to true). Global Cache MAY keep a copy of cache: false if they wish to do so. @kaiwirt agreed ?

kaiwirt commented 1 year ago

OK

6a6d74 commented 1 year ago

We still need to figure out the "meta-problem" of how an alert or issue about data is sent back to the originating centre (and/or it's affiliated GISC). This could be a "report" type message.

6a6d74 commented 11 months ago

This is about daily operations in WIS2 … so we’re looking for ET-OM to take a lead on proposing a solution.

golfvert commented 10 months ago

Agreed. This should be addressed by TT-OM.

tomkralidis commented 10 months ago

+1. Note that this was discussed at TT-WISMD this week. This will be required for GDCs to report back failures (KPIs will happen along the metrics/monitoring workflow).