metno / mmd

GNU General Public License v3.0
2 stars 11 forks source link

Make data_access and storage_information mandatory in mmd_strict.xsd #214

Closed shamlymajeed closed 1 year ago

shamlymajeed commented 1 year ago
 Closes #
 - Made 'data_access' and 'storage_information' mandatory
 - Made 'resource' in 'data_access_type' mandatory
 - Made 'file_name' and 'file_location' in 'storage_information_type' mandatory
mortenwh commented 1 year ago

I've checked out your branch and tested the dmci ingestion, but it does not reject the files:

# Should pass
$ curl --data-binary "@arome3km_3hr_202112.xml" localhost:5000/v1/validate
Everything is OK
$ curl --data-binary "@arome3km_3hr_202112.xml" localhost:5000/v1/insert
Everything is OK
$ rm -rf workdir/arch_1

# Should fail:
$ curl --data-binary "@arome3km_3hr_202112_nodap.xml" localhost:5000/v1/validate
Everything is OK
$ curl --data-binary "@arome3km_3hr_202112_nodap.xml" localhost:5000/v1/insert
Everything is OK
$ rm -rf workdir/arch_1

# Should fail:
$ curl --data-binary "@arome3km_3hr_202112_nodap_nostorage.xml" localhost:5000/v1/validate
Everything is OK
$ curl --data-binary "@arome3km_3hr_202112_nodap_nostorage.xml" localhost:5000/v1/insert
Everything is OK

Did it work for you? Can you try with the same files? They are attached, and you'll need to remove the .txt part of the filenames.

arome3km_3hr_202112.xml.txt arome3km_3hr_202112_nodap.xml.txt arome3km_3hr_202112_nodap_nostorage.xml.txt

ferrighi commented 1 year ago

What is the reasoning behind this PR? Is it linked to an issue?

mortenwh commented 1 year ago

What is the reasoning behind this PR? Is it linked to an issue?

Yes, https://github.com/metno/discovery-metadata-catalog-ingestor/issues/144

mortenwh commented 1 year ago

In addition to the changed xsl file, I'm wondering if we should check that the access and storage paths actually exists. This should be done in dmci.

ferrighi commented 1 year ago

Just a comment about the data access. We should keep in mind that we want to implement the FAIR rule about accessing metadata when data is no longer available. This will mean that we still have mmd files for the records, but the data access can/should be empty if the data have been withdrawn. Also I recall that not all mmd parent records have a data access (in case they just refer to a collection of data files, i.e. not aggregated), but they just have a landing page in which the children are listed.

mortenwh commented 1 year ago

We have agreed that this will be too strict, so closing...