wmo-im / wis2-guide

WIS2 Guide
https://wmo-im.github.io/wis2-guide
Apache License 2.0
1 stars 6 forks source link

Best practices for publishing numerical weather prediction data in WIS2 #158

Open 6a6d74 opened 1 month ago

6a6d74 commented 1 month ago

At Met Office we're trying to work out the best way to migrate our existing NWP GRIB bulletins from GTS into WIS2.

In GTS, the bulletins are packaged by physical parameter. This allows data users to select which GRIB files to retrieve based on the parameter. The view from the Met Office team is that this is still a valid requirement: the files are big - so there's value in downloading only those that are of interest. So the question is: how do we cater users who want to download only specific parameters from a model run?

Options are listed below.

Option 1: Magic "filenames" Users somehow are able to parse the "data-id" attribute in the WIS Notification Message to identify that a given data item relates to a specific parameter. Pros: No changes required to WIS2. Cons: Embedding metadata in filenames is poor practice - how do users know how to parse the filename to extract the parameter, what controlled vocab is used for the parameter, etc. A file-naming convention would need to be established that defines this. And every model data publisher would need to implement. Maybe the GTS file-naming convention could be reused - but then we're binding GTS into WIS2 which I think is not helpful.

Option 2: Fine-grained dataset definition and subscription topics Allow users to subscribe to topics that align with specific parameters so they only receive notification messages about model data containing the parameters they're interested in. Pros: No changes required to WIS2. Simple for users - all the filtering is done "server-side", i.e., a user only subscribes to what they want. Cons: Complex for data publishers, (i) publishing notifications for model output on multiple topics, (ii) all those topics need to be registered, (iii) because the subscription topic is defined in the discovery metadata, a proliferation of topics implies a proliferation of discovery metadata records all of which need to be managed. This would lead toward fine-grained discovery metadata records - and a problem similar to WIS1 where we had too many metadata records.

It is possible to put multiple subscription end-points/topics in one discovery metadata file but how would a user know which topic relates to which parameter? We're into a similar problem of having magic filenames (see option 1).

Option 3: Put parameter information in the WIS Notification Message Use an attribute in the WIS Notification Message to indicate physical parameter (or a list of parameters). Users would receive notifications for data from an entire model run, but they can do client-side filtering to identify which notification messages relate to data for the parameters they're interested in, and only download those files. Pros: No changes required to WIS2 - the WIS Notification Message is extensible so we can add extra attributes, but this would benefit from standardisation. Cons: Slightly more complex for data publishers - they need to add the parameter attribute into the notification message. Slightly more complex for users - they need to do client-side filtering to select data based on the parameter attribute.

Standardising the parameter attribute in the WIS Notification Message would be beneficial.

Option 4: Sidecar files with metadata Publish an "index" file alongside the GRIB files which says what data is in each GRIB file. NOAA and ECMWF already do this - but not in the same way. STAC (stacspec.org) provides a widely adopted and simple mechanism to provide structured metadata about file-based data assets (and note that the basic/mandatory content of a STAC item is pretty similar to that of a WIS Notification Message). Data publishers could publish a STAC Collection for the model run with a set of STAC items each of which refer to a GRIB file (or other data asset). Pros: No changes required to WIS2 - the sidecar files are just more data items! Sidecar files contain more file-level metadata allowing users to be even more selective about what they download. Adopting a STAC-based approach would enable easy integration into a broad community of ecosystems (see this example for NOAA's HRRR data - stactools-package, stac-explorer). This would provide a generalised approach that enables data users to select only the data they need - whether downloading for local use or using directly in the cloud. Cons: A new standardisation effort, probably within ET-Data, would be needed within WMO to adopt this approach - which would take time. Data publishers and data users would need to update their workflows to adopt.

Recommendation: Option 3 seems like the best balance to me. It would be optional for data publishers - if they didn't include a parameter attribute everything still works - albeit that users can't distinguish which data files relate to which parameters and would have to download everything. Users don't have to do client-side filtering - they have the option to filter or just download everything. Standardising the parameter attribute would drive consistency across the WIS2 ecosystem - which would help system designers/vendors provide server- and client-side systems with the parameter filtering capability.

Option 4 might be a good longer-term aim.

Personally, I think Option 1 is a poor choice and Option 2 creates a mess of too many topics and datasets.

6a6d74 commented 1 month ago

If we can agree the best way to publish NWP data into WIS2, the Guide should be updated with this information - plus any other changes that might be necessary (e.g., if we include an optional parameter attribute in the WIS2 Notification Message)

golfvert commented 1 month ago

I had a side discussion with Tom about this. France is having the same issue. We are putting our NWP products in "packages", to make the file size "decent". In our case, it is a mix between steps and parameters. I guess each NWP centre may want to define packages in its own way.

First, I don't think that domain specific solutions should be in the guide. In the logic that WIS2 is "pipes". Having a common way to present/describe the packages (if they choose to do packages) by all NWP centres is important, though.

I'd say too that Option 3 is the right option. In the guide, we can suggest that to provide client side filtering, adding a section in the properties is the preferred method.

Then, we ask the WIPPS team (where the sublevel of the TH and metadata aspects were agreed) to discuss and agree how the filtering should look like. This is then included in a cookbook (https://github.com/wmo-im/wis2-cookbook) or something like that.

In short, we define the overall approach, they agree on the specifics. The result is not in the guide.

tomkralidis commented 1 month ago

+1 for Option 3. WMN properties.parameter (or actualy properties.parameters[] for granules providing > 1 parameter) is the least disruptive and valuable update. It's likely that this can be used for other domains as well, and can be "filterable" from a Global Replay Service perspective.

Given this would benefit multiple domains, I propose we add to WNM proper as an optional element.

amilan17 commented 1 month ago

@sebvi @wmo-im/tt-nwpmd

amilan17 commented 1 month ago

@6a6d74 @golfvert Please see the decision in the TT-NWPMD for how to provide index files: https://github.com/wmo-im/tt-nwpmd/issues/13

"NWPMD meeting on 2023.09.14

TT-WISMD agree to include a link to the index file in the notification message. The index file itself won't be cached. TT-NPWMD won't define the format of the index file and agreed that each Centre can use its own format of the index file. The ticket https://github.com/wmo-im/tt-nwpmd/issues/13 is closed."

6a6d74 commented 1 month ago

The Met Office team will look into implementing option #3, and involving IBL in that discussion

6a6d74 commented 1 month ago

@6a6d74 @golfvert Please see the decision in the TT-NWPMD for how to provide index files: wmo-im/tt-nwpmd#13

"NWPMD meeting on 2023.09.14

TT-WISMD agree to include a link to the index file in the notification message. The index file itself won't be cached. TT-NPWMD won't define the format of the index file and agreed that each Centre can use its own format of the index file. The ticket wmo-im/tt-nwpmd#13 is closed."

Good to see that TT-NWPMD/TT-WISMD have discussed index file. Noting that TT-NWPMD won't define the index file format, there's still an outstanding need for standardisation before we, the WWW community, could adopt this approach for operational weather prediction.

sebvi commented 3 weeks ago

At the time we discussed index files, there was no consensus on what the format could be and it felt that spending time discussing it would slow down our work on defining the THs for weather. Agreeing on a common format is always difficult as many NMS have already their own way of indexing and are not necessarily keen on changing because it means development and allocating resources. At ECMWF, we provide indexes in the format produced by ecCodes.