Open steingod opened 3 years ago
That sounds good but I think the external option has two alternatives:
The last one is preferred as long as we can align the metadata specifications, isn't it?
The metadata harvesting is only done from metadata endpoints. Metadata creation from a single netCDF or traversing a thredds catalogue is something different. So I agree with the two options, either internal, meaning we handle our own metadata, or external, meaning we get metadata from external sources, which we do not own.
I'm not sure if I understand - but are you saying that the metadata catalogs also typically have the same issues with elements that, e.g., are not populated correctly? And because of that, it doesn't matter whether we harvest directly from netcdf/thredds or the catalogs? Wouldn't that result in duplicates?
I would say that we have similar issues whether harvesting using discovery metadata standards (ofc CSW/OAI-PMH/OpenSearch with ISO19115 (various flavours), GCMD DIF or some other and with e.g. THREDDS harvesting (generation of discovery metadata from ACDD++ elements).
To the question if there are issues with metadata catalogues. Yes, definitely. Standards are always subject to interpretation and also how much information data centres have requested from their data providers. If they have requested too little, they serve too little. As long as information is generated outside our structure, we need to be aware. I would however suggest to add a keyword to identify when we harvest from THREDDS catalogues as well.
Reviving this in light also of the recent updates of internal mmd files. We surely have different level of richness in metadata unfortunately. If we add this element I would also say that we expose the vocabularies in the vocab.met.no with a definition such as:
There is a need for a new element in MMD. We have two major flows of metadata.
When we harvest information from external sources we are at any time relying on the performance and implementations of the partner data centres. These are not always using similar metadata schemes as we do and information is exchanged using international standards. This creates some trouble in light of the MMD requirements. Some elements are not covered by international standards and some elements are not populated correctly. It is not an option to stop harvesting. thus it might be necessary to distinguish between these two sources for information. In order to simplify maintenance of our catalogues I suggest that we add a new element, along the lines of
<mmd:metadata_source>Keyword</mmd:metadata_source>
where Keyword is taken from a list like:
The metadata generation process can then set this element during generation.
@mortenwh and @ferrighi what are your opinions?