WIS2 topic hierarchy, structure

markusfiebig commented 1 year ago

This issue refers to this Wiki page: https://github.com/wmo-im/et-acdm/wiki/ET-ACDM-input-for-development-of-WIS2-topic-hierarchy

In a hierarchical structure, the position of an item must be unambiguous. For gases there are cases which may be placed under greenhouse gases or reactive gases. Ozone receives a special role, even though it is a reactive gas. Question is whether the hierarchy should be based on WMO organisation history, or whether we try to clean up and use a structure guided by internal logic.
Relevant bodies, e.g. CF, are now talking about "aerosol particles" instead of aerosol, finding a compromise between different communities.

tomkralidis commented 1 year ago

Target is a first pass for our next meeting in April 2023.

tomkralidis commented 1 year ago

ET-ACDM 2023-06-08:

review/discussed initial proposal
target is to finalize/agree on initial implementation at ET ACDM 4
ACTION: provide updates/alternate proposals on https://github.com/wmo-im/et-acdm/wiki/ET-ACDM-input-for-development-of-WIS2-topic-hierarchy

cc @atverm @markusfiebig @gaochen-larc @ejwelton @sergimorenovalero @joergklausen (please tag others as needed).

joergklausen commented 1 year ago

ET-ACDM-6 discussed and elaborated further ....

NB: Definitions suggested in brackets added after the meeting.

"Our" level 8 to work out is "Atmospheric composition", with following sub-categories

observations [= observational data and data derived from observations with a physical stimulus. Model output is excluded.]
- gases
  - greenhouse gases
  - ozone
  - reactive gases (excluding stratospheric ozone)
- aerosol particles and clouds
- surface flux densities
  - precipitation chemistry
  - dry deposition
- emissions
- radiation and latent heat
  - UV radiation
- atmospheric processes and kinetics
analysis-prediction [=generally, products derived from analyzing observations, as well as predictions/forecasts]
- sand and dust
- air pollution
- fires
- ozone & UV radiation
- greenhouse gases
- total atmospheric deposition
advisories-warnings [=products specifically generated for the purpose of advisories and warnings]
- air pollution
- sand and dust
- fires
- ozone & UV radiation
- nuclear fallout
- volcanic activity

tomkralidis commented 1 year ago

Should we consider putting forth a draft? Suggested next steps would be to create a topic-hierarchy structure in this repository as part of the proposal for our initial entry into the WIS2 topic hierarchy. Thoughts?

joergklausen commented 1 year ago

Should we consider putting forth a draft? Suggested next steps would be to create a topic-hierarchy structure in this repository as part of the proposal for our initial entry into the WIS2 topic hierarchy. Thoughts?

I agree, but I don't quite understand what you mean we should do in addition ...

tomkralidis commented 1 year ago

@joergklausen I've put forth a first pass in this branch for review based on the above.

ejwelton commented 1 year ago

I agree with the 3 topics observations, analysis-prediction, advisories-warnings.

Under observations, I feel strongly that aerosols and clouds sub-topic should be separated into separate sub-topics.

For observations I feel that our ET can eventually flesh this out on our own. However, we need input from experts outside our ET for analysis-prediction, advisories-warnings topics. For aerosol related sub-topics I would suggest first getting input from Sarah Basart WMO. I can also meet with her in Geneva in 2 weeks if she will be available (I will be there for another meeting).

gaochen-larc commented 10 months ago

What about water vapor? We can treat it as a meteorological variable, but it is also important to atmospheric chemistry.

markusfiebig commented 10 months ago

What about water vapor? We can treat it as a meteorological variable, but it is also important to atmospheric chemistry.

For ACTRIS, we have now agreed to define:

water vapour mass concentration, alt label : absolute humidity water vapour mass fraction, alt label: specific humidity water vapour liquid water saturation fraction, alt label: relative humidity with respect to water water vapour ice water saturation fraction, alt label: relative humidity with respect to ice

And we carry those both as meteorological and chemical variables. This is a very good example why a strict, unambiguous hierarchy will be difficult or impossible to achieve.

joergklausen commented 8 months ago

topic has been migrated to https://github.com/wmo-im/wis2-topic-hierarchy/tree/et-acdm-topics/topic-hierarchy/earth-system-discipline/atmospheric-composition, i.e., the results of the discussion should be maintained there. Comments to the topic hierarchy should still be entered here ...

sbasart-wmo commented 8 months ago

Hello Jörg,

This week I had a meeting with some WMO colleagues to try to understand how is organised the WIS2.0 system and provide some feedback on the proposal.

Important technical considerations to be aware of the WIS2.0 for the hierarchy design are the following:

Only considers NRT information. This means that operational systems submit files to WIS2.0 every day.
WIS2.0 hierarchy does not refer to the data itself, it's the notification system to track the submission of the data files for operational users, i.e. their availability. For this reason, for example, you can find weather > aviation, this category includes all synop and metar.
Only it is possible to associate one notification with a file/system. This is, for me, the most important point because it has direct implications in the hierarchy design and is on top of the "user" requirement.
No more than 4 levels in the hierarchy.
Although it is not mandatory, it is suggested trying to be aligned with the structure of the rest of the disciplines in GitHub.

Considering these points, I would suggest the following structure under atmospheric-composition

surface-based-observations: We need to take into consideration the operational workflow that follows the structure of the GAW Word Data Centers. I mean how the files are submitted in the Data Centers, because this is what I understand will be associated with the notification. Also before defining the categories here, we would need to identify what datasets are potentially available in NRT. @ejwelton, as far as I understand, the definition of the parameters is not directly connected with the WIS2.0 protocol. The WIS2.0 metadata file lists all the parameters included in the file. The WMO standards/conventions are another separate discussion.

space-based-products: At the moment, there is no example for weather-space, but I would say that here we should consider the different satellite-retrieved products, that are associated with different datasets. For example, aod > modis-retrieval, or no2 > omi-retrieval. But this is something that we need to double-check with the WMO satellite group.

analysis-predictions

analysis: Datasets of analysed atmospheric conditions for a specified time or period in the past (Example: ECMWF-CAMS)
forecast: Datasets of expected atmospheric conditions for a specified time or period in the future, typically including the T+0 fields (Example: ECMWF-CAMS, BSC-MONARCH, CMA-CUACE,...)
reanalysis: Datasets of retrospective run for a specified time or period in the past (Example: ECMWF-CAMSRA, NASA-MERRA2,...)
hindcast: Datasets constructing a record of past conditions

advisories-warnings

sand-and-dust
air-pollution (@all, here is it also considered tropospheric ozone)
wildfires
nuclear-fallout (@Anna, we need to check if there is already the notification in weather/aviation, if this is the case, this should be removed, because only there can be one unique notification in the WIS2.0 overall system)
volcanic-eruptions (@Anna, we need to check if there is already the notification in weather/aviation, if this is the case, this should be removed, because only there can be one unique notification in the WIS2.0 overall system)
ultraviolet-radiation (@all, the alert is connected with the UV levels, not the ozone, and we need to check if there is already a notification in weather, if this is the case, this should be removed, because only there can be one unique notification in the WIS2.0 overall system)

markusfiebig commented 8 months ago

We should consider Sara's latest comment in this issue. Here, she says it is an absolute requirement to only have one point for a data stream in the discovery hierarchy. The fact that we only deal with RT data isn't really relevant. After all, we are defining a hierarchy for discovery purposes.

Having only one entry point per data stream means we need a 1:1 logic between data stream and hierarchy structure. To organize the hierarchy by topics cogently implies that there is a 1:n relation between data stream and hierarchy structure. Most of our data streams can be sorted under several topics. That leads to the logical conclusion that we cannot organize the hierarchy by topic, but need to use other concepts complying with the intrinsic logic of the data streams - for example variable matrix.

sbasart-wmo commented 8 months ago

We should consider Sara's latest comment in this issue. Here, she says it is an absolute requirement to only have one point for a data stream in the discovery hierarchy. The fact that we only deal with RT data isn't really relevant. After all, we are defining a hierarchy for discovery purposes.

Having only one entry point per data stream means we need a 1:1 logic between data stream and hierarchy structure. To organize the hierarchy by topics cogently implies that there is a 1:n relation between data stream and hierarchy structure. Most of our data streams can be sorted under several topics. That leads to the logical conclusion that we cannot organize the hierarchy by topic, but need to use other concepts complying with the intrinsic logic of the data streams - for example variable matrix.

@markusfiebig I've tried to include some examples, but I still need to figure out how to proceed with the surface-based observations. Who else is potentially using these observations? This is the main question we need to clarify to create the hierarchy. This is how to group the files/systems notifications to consider an application. Considering potential "daily" users, we can consider:

modellers (i.e. assimilation/evaluation in daily forecasting system) = all type of parameters
aviation forecasters (profiles)
air quality stakeholders (i.e. surface concentrations of PM2,5, PM10,O3, NO2, SO2, CO)

Now, the only example I have in mind on the GAW-NRT dataset is MPLNet (i.e., Judd), which are aerosol profiles. Then, you can consider it in a category called "aviation", but they can also be used by "modellers"; here is my main doubt. A clean and simple solution is to split it into three categories: surface, column-integrated and profiles. What do you think?

markusfiebig commented 8 months ago

@sbasart-wmo , your proposal would use the observation geometry as sorting criterion, which would be an option since it allows for an unambiguous distinction of observations.

gaochen-larc commented 8 months ago

what about aircraft-based in-situ measurements? should these measurements be classified in the surface category? There are also aircraft spiral profile measurements... Also, what about balloon sonde or dropsonde profile measurements? Many profile measurements would give both profile and column-integrated quantifies.

sbasart-wmo commented 7 months ago

@gaochen-larc If you check other examples you will see that aircraft_observations can be considered a different category than surface_based_observations. The same criteria that we can follow for balloon_based_observations

amilan17 commented 7 months ago

After all, we are defining a hierarchy for discovery purposes.

@markusfiebig - the WIS metadata record (WCMP2) is for discovery. The topic hierarchy is essentially a channel for receiving notifications after the dataset you want has been discovered. The channel has minimal meaning. it's kind of like a radio station channel, e.g. AM or FM with specific call numbers.

markusfiebig commented 7 months ago

@amilan17 - in the practical use cases, the topic hierarchy will essentially be an abbreviated version of discovery metadata. If it was "only a name", we might call the channels "Donald Duck" etc, which we obviously aren't doing. If the user needs to know the exact location of a product in the topic hierarchy in order to find it, we are creating a system that only a few core experts can use. We should avoid that.

tomkralidis commented 7 months ago

We need to ensure a balance of having a clear way to delineate topics that datasets can be published to for event driven workflows. At the same time, the WIS2 Topic Hierarchy is not a taxonomy or knowledge organization system per se, and that the key workflow is:

publisher publishes WIS2 metadata for a dataset
consumer discovers and assesses/evaluates dataset on WIS2 Global Discovery Catalogue for use
where applicable, user subscribes to data notifications via MQTT via a given topic

amilan17 commented 7 months ago

Sara's proposal summarized as topics:

origin/a/wis2/{centre-id}/data/{core or recommended}/atmospheric-composition/surface-based-observations
origin/a/wis2/{centre-id}/data/{core or recommended}/atmospheric-composition/space-based-products
origin/a/wis2/{centre-id}/data/{core or recommended}/atmospheric-composition/predictions
origin/a/wis2/{centre-id}/data/{core or recommended}/atmospheric-composition/advisories-warnings/sand-dust
origin/a/wis2/{centre-id}/data/{core or recommended}/atmospheric-composition/advisories-warnings/air-pollution
origin/a/wis2/{centre-id}/data/{core or recommended}/atmospheric-composition/advisories-warnings/wildfires

Noting, that we ONLY need topics for near/real-time data notifications.

sbasart-wmo commented 7 months ago

@amilan17 is analysis-predictions

amilan17 commented 7 months ago

For example, aod > modis-retrieval, or no2 > omi-retrieval. But this is something that we need to double-check with the WMO satellite group.

@sbasart-wmo - CGMS is proposing all operational satellites for weather and space-weather. I see no issue with repeating the same structure for relevant satellites under atmospheric-composition.

amilan17 commented 7 months ago

what about aircraft-based in-situ measurements?

@gaochen-larc we should consider these datatypes as surface-based-observations.

tomkralidis commented 3 months ago

As discussed at ET-ACDM 2024-08-13, the current proposal can be found in https://github.com/wmo-im/wis2-topic-hierarchy/tree/et-acdm-topics/topic-hierarchy/earth-system-discipline/atmospheric-composition

We have identified a next step of each data centre to provide the topic(s) they would publish to using the proposed hierarchy.

joergklausen commented 3 months ago

Issue closed with reference to #24 where a sort of 'implementation test' is open for contributions.

wmo-im / et-acdm

WIS2 topic hierarchy, structure #20