Closed josusky closed 1 year ago
Given HTTP is a requirement to put forth a canonical link in a message, can an update be put forth as a WNM with properties.operation="update"
, and then somehow utilize HTTP ETag?
Hm, that might be a way in an HTTP-only world but even there it is not optimal. It requires a request each time a WIS2 node (including the Global Systems) gets a "duplicate" notification - to check if there is really a change or not. Moreover, the link may also use SFTP (and possibly other protocols), so I would prefer a protocol-agnostic solution. Similarly, we do not use any MQTT-specific features to store properties but we put everything into the payload (WNM JSON). Therefore, I would prefer to have the updates/corrections fully expressed in the WNM itself.
I agree with @josusky . We should come up with a solution that is WIS2 specific (and not relies on the underlying technology).
My opinion is, that we
The second approach seems more beneficial to me than having a simple version number. One reason why GTS has cca and so forth is because you don't have the possibility to use timestamps other than the date of observation.
@kaiwirt If I understand you right, you are proposing to use data_id
+ pubtime
to uniquely identify a data instance, when a system receives a notification and needs to decide whether to download the data or not.
This is a valid approach and both these properties are "required" by the WNM schema. However, we must ensure that that pubtime
is never altered (especially not by the Global Caches) otherwise we would get an endless loop.
The use of integrity
values would be a bit safer in the sense that its value is more tightly related to the data. However, that property is optional at the moment. And its drawback is that you can only compare it for "inequality" not for "is greater", meaning that if you get two notifications you can tell that they are different but you do not know which one is more up-to-date. For this purpose, the pubtime
is clearly a better option.
I propose to use
data_id
+ datetime
+ pubtime
indicating this data for this reference time published at that pubtime to uniquely identify a data instance. If there is a correction to that data, then
data_id
+ datetime
would stay the same and pubtime
would be altered to reflect that there has been another (newer) publication of that data instance.
The discussion whether caches should or should not modify the pubtime
is here: https://github.com/wmo-im/wis2-guide/issues/33 and so far we have agreed to not modify pubtime.
datetime is possibly a part of data_id or different data_ids per datetime (the metadata_id could be the same for all datetimes). Therefore data_id + pubtime should be enough, right?
Hm, I think Antje is right that data_id
is defined in WNM (7.1.7.2.) as "uniquely identifies the data as defined by the data producer" and I think for all other purposes (including deduplication) we considered it to be unique for the data instance so the datetime
is not necessary, am I right @tomkralidis?
In any case, so far we have a proposal to use pubtime
to identify updates/corrections of already published data instance. My gut feeling is that having a more explicit "I am a correction" label in the WMN would be safer ... but I might get used to pubtime
eventually.
TT-WISMD 2023-08-14:
properties.updated
?properties.data_id
/ properties.pubtime
would ensure "latest and greatest" / overwrite@josusky Recalling that discussion: There will be an explicit "I am a correction": See https://github.com/wmo-im/wis2-notification-message/issues/47
My proposal was just to use pubtime instead of a version number. Something like this:
This is related to wmo-im/wis2-notification-message#47 (and PR wmo-im/wis2-notification-message#48) which in turn refers to an excellent comment of @hhaddouch that can be rephrased as: "How can I, as a data producer, notify data consumers that I had to correct/update a data instance?". This is something that in the GTS world is solved by setting the
BBB
part of the abbreviated heading toCCA
,CCB
, etc. or in some cases toAAA
,AAB
etc.Possible solutions:
properties.operation
valueupdate
- the drawback is that it currently does not identify the version of the update, i.e. there is no way how to update and update (especially taking into account that the notifications may not necessarily arrive in the right order as they can take each a different path, through different Global Brokers and so on). possibly we could extend theproperties.operation
value to be an object that also has a version or we could require each update/correction to use a different value ofproperties.datetime
, orproperties.pubtime
? But we might not want to change the first one as it may be part of the data instance identification (e.g. time of observation) while the other is not preserved by the Global Cache, is it?data_id
by the data producers - some producers have some sort of versioning expressed in the file name (that they translate intodata_id
) but it is not universal/standardised, except for the cases when the GTS file naming convention is used :-) ** this makes sense in a way that we require thedata_id
to be unique and if the corrected version did not use a newdata_id
we would need to adjust also the deduplication mechanisms to take into account the value ofproperties.operation
...