metno / mmd

GNU General Public License v3.0
2 stars 11 forks source link

Need for a new element #130

Open steingod opened 3 years ago

steingod commented 3 years ago

There is a need for a new element in MMD. We have two major flows of metadata.

  1. From internal structures (data produced and/or maintained internally)
  2. From external data centres through metadata harvesting

When we harvest information from external sources we are at any time relying on the performance and implementations of the partner data centres. These are not always using similar metadata schemes as we do and information is exchanged using international standards. This creates some trouble in light of the MMD requirements. Some elements are not covered by international standards and some elements are not populated correctly. It is not an option to stop harvesting. thus it might be necessary to distinguish between these two sources for information. In order to simplify maintenance of our catalogues I suggest that we add a new element, along the lines of

<mmd:metadata_source>Keyword</mmd:metadata_source>

where Keyword is taken from a list like:

The metadata generation process can then set this element during generation.

@mortenwh and @ferrighi what are your opinions?

mortenwh commented 3 years ago

That sounds good but I think the external option has two alternatives:

  1. External harvest from netcdf/thredds/..
  2. External harvest from metadata catalog

The last one is preferred as long as we can align the metadata specifications, isn't it?

ferrighi commented 3 years ago

The metadata harvesting is only done from metadata endpoints. Metadata creation from a single netCDF or traversing a thredds catalogue is something different. So I agree with the two options, either internal, meaning we handle our own metadata, or external, meaning we get metadata from external sources, which we do not own.

mortenwh commented 3 years ago

I'm not sure if I understand - but are you saying that the metadata catalogs also typically have the same issues with elements that, e.g., are not populated correctly? And because of that, it doesn't matter whether we harvest directly from netcdf/thredds or the catalogs? Wouldn't that result in duplicates?

steingod commented 3 years ago

I would say that we have similar issues whether harvesting using discovery metadata standards (ofc CSW/OAI-PMH/OpenSearch with ISO19115 (various flavours), GCMD DIF or some other and with e.g. THREDDS harvesting (generation of discovery metadata from ACDD++ elements).

To the question if there are issues with metadata catalogues. Yes, definitely. Standards are always subject to interpretation and also how much information data centres have requested from their data providers. If they have requested too little, they serve too little. As long as information is generated outside our structure, we need to be aware. I would however suggest to add a keyword to identify when we harvest from THREDDS catalogues as well.

ferrighi commented 2 years ago

Reviving this in light also of the recent updates of internal mmd files. We surely have different level of richness in metadata unfortunately. If we add this element I would also say that we expose the vocabularies in the vocab.met.no with a definition such as: