metno / mmd

GNU General Public License v3.0
2 stars 11 forks source link

support for parent in mmd-to-geonorge.xsl #173

Closed ferrighi closed 1 year ago

ferrighi commented 2 years ago

To support display of data series for inspire records:

The issue would be how to identify a parent dataset. We do not have something that can be parsed in the mmd record during translation.

ferrighi commented 2 years ago

We will solve this for now by passing the file_name parameter into the xslt. When the file_name contains the string "parent" the xslt will produce a series output. To pass the file_name:

  1. xsltproc --stringparam file_name filename
  2. for etree: output = transform(xml_doc, file_name=etree.XSLT.strparam("filename"))
mortenwh commented 2 years ago

It is controversial to depend on the filename. We need to consider if we can add a new field in MMD to identify if a dataset is a "parent dataset". See also https://github.com/metno/discovery-metadata-catalog-ingestor/issues/90

mortenwh commented 2 years ago

@ferrighi @steingod - where are we regarding this?

ferrighi commented 2 years ago

As we do not have a way to identify a parent in mmd and you cannot include the "_parent" in the mmd file name (which is anyway not a very strong solution), I think the only want to proceed is to add an element such as "mmd:hierarchy" or "mmd:level". We where used to identify datasets as Level-1 (parent) and Level-2 (children) due to the SolR implementation we had before, but this does not apply anymore. In iso the element is hierarchyLevel which uses the codeliste MD_ScopeCode, where in our case the possible values are: 'dataset' for granule level, 'series' for collection level. I would only use the value "series" (or we can use "parent" to be consistent with our naming), assuming that when this element is not used we have a simple dataset which can also be a child, but that would be identified through the presence of related_dataset.

steingod commented 2 years ago

The approach we are using is compliant with how other data centres are handling this, although these are not relying on ISO19115 much. We are currently harvesting parent/child datasets from NPI, PANGAEA, NERSC and more data centres and the we handle this in our system is backtracing when ingesting etc. We are not actively using file names with parent, that has more been a convenience approach for ingestion. The challenge of adding more MMD fields is to ensure consistency between parent and children. Backtracing from children has proven robust, but I see that it won't work in this context unless proper software is built around the translation. However, adding fields and ensuring consistency also probably would need some dedicated software.

mortenwh commented 1 year ago

Kartverket is eager to get this issue solved. Any chance we can meet and discuss it?

steingod commented 1 year ago

I have some time available next week, Will then be away for a week.

johtoblan commented 1 year ago

@TAlonglong should this be closed now also?

TAlonglong commented 1 year ago

I'm not sure. I think we agreed to do a test building the dmci container with this branch of MMD, because there are some other changes in MMD master that might affect things.

So that means I have to change https://github.com/metno/discovery-metadata-catalog-ingestor/blob/main/container/Dockerfile#L13 to use this branch until we have a new releas of MMD.