ufal / clarin-dspace

clarin-dspace digital repository based on DSpace and LINDAT/CLARIN DSpace
http://lindat.cz
BSD 3-Clause "New" or "Revised" License
27 stars 18 forks source link

healthcheck: Number of records in vlo is null #876

Open cyplas opened 6 years ago

cyplas commented 6 years ago

Our recent weekly healthchecks haven't been finding our entries in VLO:

#### VLO check [took: 0s] [# lines: 4]
Number of records in vlo is null (we have 118 items in our repository).
The records were harvested at null.
It contains 0 errors.
Results gathered from http://catalog.clarin.eu/oai-harvester/Slovenian_language_resource_repository_CLARIN_SI.html

The problem is not specific to our installation, but is due to changes in the harvester. From slack:

indeed, there is a new OAI harvest viewer and I think these static pages are no longer supported. There is probably a new way of checking this, I think best via the new API.

This is the call that can give you the relevant information (in a different format but easier to parse). However as a generic solution it would only work if you would first lookup the endpoint ID.

https://vlo.clarin.eu/oai-harvest-viewer-api/v2/oai/_table/endpoint_record?offset=0&limit=1000&include_count=true&filter=(metadataPrefix%3D%27cmdi%27)+AND+(endpoint%3D25)+AND+(harvest%3D3)&api_key=00551c93af07a0e2c22628ad6214b9ab250cdfa82a5be2fc04789920e27a7170&_=1534405723436 (list of all endpoints via https://vlo.clarin.eu/oai-harvest-viewer-api/v2/oai/_table/endpoint_info?offset=0&limit=250&include_count=true&filter=(harvest%3D3)&api_key=00551c93af07a0e2c22628ad6214b9ab250cdfa82a5be2fc04789920e27a7170&order=name+ASC&_=1534405723433)

twagoo commented 6 years ago

Note that we have implemented a redirect so that you now get, for example,

http://catalog.clarin.eu/oai-harvester/Slovenian_language_resource_repository_CLARIN_SI.html -> https://vlo.clarin.eu/data/Slovenian_language_resource_repository_CLARIN_SI.html -> https://vlo.clarin.eu/data/clarin/results/cmdi/Slovenian_language_resource_repository_CLARIN_SI/

which lists in a harvest result files listing IFF the endpoint with the provided name exists.

Note that we cannot provide this as a supported interface; both the (nature of) the response content and the URL may change. We hope to be able to provide an API with a well defined contract available in the future. However for now this solution might mitigate the issue.