wmo-im / wis2-notification-message

WIS 2.0 MQP message to notify users of availability of new data
https://wmo-im.github.io/wis2-notification-message
2 stars 2 forks source link

How to notify about correction or similar type of data update #50

Closed josusky closed 1 year ago

josusky commented 1 year ago

This is related to wmo-im/wis2-notification-message#47 (and PR wmo-im/wis2-notification-message#48) which in turn refers to an excellent comment of @hhaddouch that can be rephrased as: "How can I, as a data producer, notify data consumers that I had to correct/update a data instance?". This is something that in the GTS world is solved by setting the BBB part of the abbreviated heading to CCA, CCB, etc. or in some cases to AAA, AAB etc.

Possible solutions:

  1. use the properties.operation value update - the drawback is that it currently does not identify the version of the update, i.e. there is no way how to update and update (especially taking into account that the notifications may not necessarily arrive in the right order as they can take each a different path, through different Global Brokers and so on). possibly we could extend the properties.operation value to be an object that also has a version or we could require each update/correction to use a different value of properties.datetime, or properties.pubtime? But we might not want to change the first one as it may be part of the data instance identification (e.g. time of observation) while the other is not preserved by the Global Cache, is it?
  2. add one more property that identifies the data instance version
  3. leave this to the creative use of data_id by the data producers - some producers have some sort of versioning expressed in the file name (that they translate into data_id) but it is not universal/standardised, except for the cases when the GTS file naming convention is used :-) ** this makes sense in a way that we require the data_id to be unique and if the corrected version did not use a new data_id we would need to adjust also the deduplication mechanisms to take into account the value of properties.operation ...
tomkralidis commented 1 year ago

Given HTTP is a requirement to put forth a canonical link in a message, can an update be put forth as a WNM with properties.operation="update", and then somehow utilize HTTP ETag?

josusky commented 1 year ago

Hm, that might be a way in an HTTP-only world but even there it is not optimal. It requires a request each time a WIS2 node (including the Global Systems) gets a "duplicate" notification - to check if there is really a change or not. Moreover, the link may also use SFTP (and possibly other protocols), so I would prefer a protocol-agnostic solution. Similarly, we do not use any MQTT-specific features to store properties but we put everything into the payload (WNM JSON). Therefore, I would prefer to have the updates/corrections fully expressed in the WNM itself.

kaiwirt commented 1 year ago

I agree with @josusky . We should come up with a solution that is WIS2 specific (and not relies on the underlying technology).

My opinion is, that we

The second approach seems more beneficial to me than having a simple version number. One reason why GTS has cca and so forth is because you don't have the possibility to use timestamps other than the date of observation.

josusky commented 1 year ago

@kaiwirt If I understand you right, you are proposing to use data_id + pubtime to uniquely identify a data instance, when a system receives a notification and needs to decide whether to download the data or not. This is a valid approach and both these properties are "required" by the WNM schema. However, we must ensure that that pubtime is never altered (especially not by the Global Caches) otherwise we would get an endless loop. The use of integrity values would be a bit safer in the sense that its value is more tightly related to the data. However, that property is optional at the moment. And its drawback is that you can only compare it for "inequality" not for "is greater", meaning that if you get two notifications you can tell that they are different but you do not know which one is more up-to-date. For this purpose, the pubtime is clearly a better option.

kaiwirt commented 1 year ago

I propose to use data_id + datetime + pubtime indicating this data for this reference time published at that pubtime to uniquely identify a data instance. If there is a correction to that data, then data_id + datetime would stay the same and pubtime would be altered to reflect that there has been another (newer) publication of that data instance.

The discussion whether caches should or should not modify the pubtime is here: https://github.com/wmo-im/wis2-guide/issues/33 and so far we have agreed to not modify pubtime.

antje-s commented 1 year ago

datetime is possibly a part of data_id or different data_ids per datetime (the metadata_id could be the same for all datetimes). Therefore data_id + pubtime should be enough, right?

josusky commented 1 year ago

Hm, I think Antje is right that data_id is defined in WNM (7.1.7.2.) as "uniquely identifies the data as defined by the data producer" and I think for all other purposes (including deduplication) we considered it to be unique for the data instance so the datetime is not necessary, am I right @tomkralidis? In any case, so far we have a proposal to use pubtime to identify updates/corrections of already published data instance. My gut feeling is that having a more explicit "I am a correction" label in the WMN would be safer ... but I might get used to pubtime eventually.

tomkralidis commented 1 year ago

TT-WISMD 2023-08-14:

kaiwirt commented 1 year ago

@josusky Recalling that discussion: There will be an explicit "I am a correction": See https://github.com/wmo-im/wis2-notification-message/issues/47

My proposal was just to use pubtime instead of a version number. Something like this:

  1. Receive message
  2. If event.type missing
    1. If data already downloaded do nothing
    2. else download
  3. If event.type = update
    1. if data already downloaded
      1. Check pubtime if message is pointing to a newer file
      2. Download only if newer
    2. else download
tomkralidis commented 1 year ago

Closing in lieu of https://github.com/wmo-im/wis2-notification-message/issues/47#issuecomment-1695689179