opengeospatial / ogcapi-records

An open standard for the discovery of geospatial resources on the Web.
https://ogcapi.ogc.org/records
Other
60 stars 27 forks source link

Align with Standard Harvesting Protocol like OAI-PMH (probably an extension) #50

Closed uvoges closed 1 year ago

uvoges commented 4 years ago

The requirement here is to make the OAPIR specification aligned with requirements originating from harvesting use cases and existing legacy standard harvesting protocols.

Please note: the thread "OAPIR Harvesting" is related to the definition of harvesting tasks (to let the catalogs harvest external resources) and so has a different use-case.

I cross-checked with the outdated OAI-PMH (Open Archives Initiative -> Protocol for Metadata Harvesting ) (https://www.openarchives.org/OAI/openarchivesprotocol.html) harvesting specification -> implemented in different GEO contexts - e.g. European DataPortal, WMO/WIS,...

  1. Protocol Services, Requests and Responses of OAI-PMH

Here I´ll list the services, requests and responses of OAI-PMH and how it is covered by OAPIR.

1.1 OAI-PMH - ListRecords

This verb is used to harvest records from a repository. Optional arguments permit selective harvesting of records based on set membership and/or datestamp.

Depending on the repository's support for deletions, a returned header may have a status attribute of "deleted" if a record matching the arguments specified in the request has been deleted. No metadata will be present for records with deleted status.

If a record is no longer available then it is said to be deleted. Repositories must declare one of three levels of support for deleted records in the deletedRecord element of the Identify response:

no - the repository does not maintain information about deletions. A repository that indicates this level of support must not reveal a deleted status in any response.

persistent - the repository maintains information about deletions with no time limit. A repository that indicates this level of support must persistently keep track of the full history of deletions and consistently reveal the status of a deleted record over time.

transient - the repository does not guarantee that a list of deletions is maintained persistently or consistently. A repository that indicates this level of support may reveal a deleted status for records.

If a repository does not keep track of deletions then such records will simply vanish from responses and there will be no way for a harvester to discover deletions through continued incremental harvesting. If a repository does keep track of deletions then the datestamp of the deleted record must be the date and time that it was deleted. Responses to GetRecord (see below) request for a deleted record must then include a header with the attribute status="deleted", and must not include metadata or about parts. Similarly, responses to selective harvesting requests with set membership and date range criteria that include deleted records must include the headers of these records. Incremental harvesting will thus discover deletions from repositories that keep track of them.

Deleted status is a property of individual records. Like a normal record, a deleted record is identified by a unique identifier, a metadataPrefix and a datestamp. Other records, with different metadataPrefix but the same unique identifier, may remain available for the item.

status: an optional status attribute with a value of deleted indicates the withdrawal of availability of the specified metadata format for the item, dependent on the repository support for deletions. a value of "deleted" indicates the withdrawal of availability of the specified metadata format for the item. Means that accessing this item would return a json snippet indication that and when the item was deleted.

Evaluation of OAPIR regarding the requirements: Browsing through items of (a) specific collection(s) requesting for the desired response format and do selective harvesting based on datestamp -> should be sufficient. But a "deleted" flag must be provided in case the item is tagged as deleted To support "deleted" use case means that the minimal response shall only define identifier and status mandatory otherwise we need an additional response format

1.2 OAI-PMH - ListIdentifiers abbreviated form of ListRecords, retrieving only headers rather than records. Optional arguments permit selective harvesting of headers based on set membership and/or datestamp. Depending on the repository's support for deletions, a returned header may have a status attribute of "deleted" if a record matching the arguments specified in the request has been deleted.

Evaluation of OAPIR regarding the requirements: Browsing through items with specific reduced response -> in my opinion our record model is slim enough. But a "deleted" flag must be provided in case the item is tagged as deleted

1.3 OAI-PMH - GetRecord used to retrieve an individual metadata record from a repository. Required arguments specify the identifier of the item from which the record is requested and the format of the metadata that should be included in the record. Depending on the level at which a repository tracks deletions, a header with a "deleted" value for the status attribute may be returned, in case the metadata format specified by the metadataPrefix is no longer available from the repository or from the specified item.

Evaluation of OAPIR regarding the requirements: Covered by /Item Need to check if all mandatory DC response elements are covererd by OAPIR response Not covered: A status element where a "deleted" value would specify that the metadataPrefix is no longer available from the repository or from the specified item.

1.4 OAI-PMH - ListMetadataFormats This verb is used to retrieve the metadata formats available from a repository. An optional argument restricts the request to the formats available for a specific item. Identifier an optional argument that specifies the unique identifier of the item for which available metadata formats are being requested. If this argument is omitted, then the response includes all metadata formats supported by this repository. Note that the fact that a metadata format is supported by a repository does not mean that it can be disseminated from all items in the repository.

Evaluation of OAPIR regarding the requirements: Not sure if we should support different encodings for different items.... I guess not

1.5 OAI-PMH - Identify

Used to retrieve information about a repository. Repositories may also employ the Identify verb to return additional descriptive information.

Evaluation of OAPIR regarding the requirements: Mostly covered by OpenAPI response But the following is not included in OpenAPI response: earliestDatestamp : a UTCdatetime that is the guaranteed lower limit of all datestamps recording changes, modifications, or deletions in the repository. A repository must not use datestamps lower than the one specified by the content of the earliestDatestamp element. earliestDatestamp must be expressed at the finest granularity supported by the repository. deletedRecord : the manner in which the repository supports the notion of deleted records. Legitimate values are no ; transient ; persistent with meanings defined in the section on deletion. granularity: the finest harvesting granularity supported by the repository. The legitimate values are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ with meanings as defined in ISO8601.

1.6 OAI-PMH - ListSets Summary and Usage Notes This verb is used to retrieve the set structure of a repository, useful for selective harvesting.

Evaluation of OAPIR regarding the requirements: I guess this is fully covered by the means of iterating and querying the collections...

  1. Summary (questions) of (on) required changes / extensions of the OAPIR Protocol

Questions: how to express in the API landing page (OpenAPI) which of the three levels for deleted records is supported. I guess we should define the "deleted" attribute based on a metadata records (uinque id) and not based on the encoding (unique id + format).

How to express an earliestDatestamp : a UTCdatetime that is the guaranteed lower limit of all datestamps recording changes, modifications, or deletions in the repository. A repository must not use datestamps lower than the one specified by the content of the earliestDatestamp element. earliestDatestamp must be expressed at the finest granularity supported by the repository.

How to express granularity: the finest harvesting granularity supported by the repository. The legitimate values are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ with meanings as defined in ISO8601.

pvretano commented 4 years ago

Related: issue #48

pvretano commented 4 years ago

2020-11-02: This would be an extension.

pvretano commented 1 year ago

07-APR-2023: Closing as a duplicate of #48.