Closed SigurdG closed 3 years ago
I cannot reproduce this, for me the resumption token says 4404 records. https://pure.itu.dk/ws/oai?verb=ListRecords&metadataPrefix=ddf-mxd
In [1]: from sickle import Sickle
In [2]: s = Sickle("https://pure.itu.dk/ws/oai")
In [3]: len(list(s.ListRecords(metadataPrefix='ddf-mxd')))
Out[3]: 4404
I have been working on retrieving all records from an OAI-PHM repository from various research institutions using the Sickle program in Python. I have written a code that performs a consecutive harvesting that iterates over the records of the various repositories and saves the records as an XML-file as well as into a SQL-data. Below is an excerpt of the code that specifies the consecutive harvesting of the OAI repository from a smaller research institution.
However, for some reason I am unable to retrieve all the records in the repositories. In the given example below for one institution, I am only able to retrieve around 2.900 records from the repository even though the completeListSize is 4.041 last time I checked. If I use the from parameter and perform a series of selective harvesting by date in a loop, I am able to retrieve some additional records but not all of them.
The OAI interface appears to be sending back an empty resumptionToken indicating that all records have been retrieved and therefore no errors are raised. I suspect the issue might be due to the fact that some of the records in the OAI repository are somehow empty or incomplete and that program therefore believes that all records in the repository has been retrieved. A similar but not identical issue with resumptionTokens have been raised in #25 but in that case the sickle program raised an issue.
I am unsure if it’s possible to solve the issue by adding an additional parameter that skips a record that is empty or issues a repeat request or something along those lines?