pangaea-data-publisher / fuji

FAIRsFAIR Research Data Object Assessment Service
MIT License
50 stars 35 forks source link

[Bug]: FsF-R1-01MD checks does not consider dynamic web services API #513

Open MarioLocati opened 3 weeks ago

MarioLocati commented 3 weeks ago

Description

We are trying to improve the evaluation of the following two DOIs:

In addition to DataCite metadata associated to the DOIs, we provide TC211 metadata, respectively:

Data from both can be downloaded using Open GeoSpatial Consortium (OGC) web services standards:

These standards support multiple data encoding formats, the user may request any of these supported formats, and may even apply a custom filter reducing the amount of data returned by the web service.

The "FsF-R1-01MD - Metadata specifies the content of the data" check seems to expect a "file type" and a "file size" but there is no way to provide it as in the case of web services many output formats are supported, and the size of the data output may vary depending on a combination of the selected output format and the presence of a data filter.

Expected Behavior

If web services are mentioned in the metadata to download data, the check should fully pass anyway, it does not matter if file type and size are not declared in the metadata.

Actual Behavior

The "FsF-R1-01MD" complains because neither the "file size" nor the "file type" information are specified in metadata.

Possible Fix

The "FsF-R1-01MD" check should be able to identify the presence of download web services, at least the most used ones such as the Open GeoSpatial Consortium (OGC), that in Europe are the preferred way to publish spatial data and the recommended way by the INSPIRE Directive, the Infrastructure for Spatial Information in the European Community.

Steps to reproduce

Simply perform a check using one of the DOIs provided above.

huberrob commented 3 weeks ago

Dear Mario,

I think I understand the problem. You are delivering data via services (OGC) instead of data objects (dataset) and unfortunately F-UJI exclusively was built to support data objects. I know the difference seems to be subtile but this explains the results.
On the other hand, some major standards such as DCAT or ISO no not differentiate much between data and their transport methods. So this might justify changes to F-UJI.

I could therefore imagine to implement an additional test which checks for two service specific metadata properties: protocol and service endpoint In your example, this is specifically (and well) described using gmd:transferOptions which would provide the necessary metadata and instead testing for file size and file type F-UJI could test for the presence of these properties.

I know that currently GeoInquire and e.g. FAIR-EASE are trying to improve FAIR for GEO. Are you also involved via EPOS? It would be good to solve this issue within a broader community..

Robert

MarioLocati commented 3 weeks ago

Dear Robert great about the possibility to introduce an additional check to find out about the existance of web services endpoint(s), please keep this issue updated about your progress on this.

It is worth mentioning that we do provide a link to the ISO 19115/19139-TC211 metadata in the DataCite DOI, see the "relatedIdentifiers" tag with a "relationType" set to "HasMetadata". Such a relation is present in both the XML and JSON version provided by DataCite services, but is missing in their JSON-LD output, using the same DOIs examples above respectively

XML: https://api.datacite.org/application/vnd.datacite.datacite+xml/10.6092/ingv.it-ahead JSON: https://api.datacite.org/application/vnd.datacite.datacite+json/10.6092/ingv.it-ahead JSON-LD https://api.datacite.org/application/vnd.schemaorg.ld+json/10.6092/ingv.it-ahead

XML: https://api.datacite.org/application/vnd.datacite.datacite+xml/10.13127/efsm20 JSON: https://api.datacite.org/application/vnd.datacite.datacite+json/10.13127/efsm20 JSON-LD https://api.datacite.org/application/vnd.schemaorg.ld+json/10.13127/efsm20

Yes, I am invoved in a broader geosciences community, Geo-INQUIRE (reference persons: Laurentiu Danciu @danciul and Javier Quinteros, @javiquinte), EPOS (reference persons: Rossana Paciello @rpaciello, Kety Giuliacci @Kety20 and Daniele Bailo @danielebailo). In addition, I am the coordinator of the INGV Data Management Office that runs the DOI service associated to our Data Registry, so for any dataset published at INGV (DOI prefix "10.13127" and "10.6092/ingv.it") we have full control on DataCite and ISO 19115/19139/TC211 metadata, whereas the control over the DCAT-AP output will be improved as soon as possible.