Closed edsu closed 1 year ago
H @edsu , Apologies for the long delay in getting back to you on this. Looking at the implementation, it does seem to support a combination of resumptionToken and until as it re-uses the logic for ListIdentifiers before fetching the records requested. Please see below the code snipped:
def listRecords(self, metadataPrefix, resumptionToken=None,
from_=None, until=None):
"""Get a list of header, metadata and about information on records.
Args:
metadataPrefix (string): identifies metadata set to retrieve
resumptionToken (string): the resumptionToken
Should raise error.CannotDisseminateFormatError if metadataPrefix
is not supported by the repository.
Should raise error.NoSetHierarchyError if the repository does not
support sets.
Returns:
string: the response
"""
root = self.getRootLxmlNamespace()
request = ET.Element(
'request',
verb='ListRecords')
request.text = BASE_URL
request.attrib['metadataPrefix'] = metadataPrefix
root.append(request)
listRecords = ET.Element('ListRecords')
start = int(resumptionToken)
identifiers_data = self.solr.get_list_identifiers(start, from_, until)
identifiers = identifiers_data['docs']
numFound = identifiers_data['numFound']
if ((start + SOLR_ROWS) > numFound):
raise ErrorHandler(ErrorCode.BAD_RESUMPTIONTOKEN, None)
for identifier in identifiers:
language = self.get_language_from_identifier(identifier['id'])
if not language:
raise ErrorHandler(ErrorCode.ID_DOES_NOT_EXIST, None)
ead_data, image_path, source_content_type, userestrict = \
self.solr.get_metadata(identifier['id'][:-3], language)
if language == 'en':
source_content_type_en = source_content_type
else:
source_content_type_en = \
self.solr.get_metadata(identifier['id'][:-3], 'en')[2]
if not ead_data:
raise ErrorHandler(ErrorCode.ID_DOES_NOT_EXIST,
None,
{'verb': 'ListRecords',
'identifier': identifier})
mods = self.GetRecordData(identifier['id'][:-3], language,
ead_data, metadataPrefix, image_path,
source_content_type,
source_content_type_en, userestrict)
listRecords.append(mods)
resumptionToken_element = ET.Element('resumptionToken')
resumptionToken_element.attrib['completeListSize'] = str(numFound)
resumptionToken_element.attrib['cursor'] = str(start)
if (start + SOLR_ROWS) != numFound:
resumptionToken_element.text = str(start + SOLR_ROWS) + \
metadataPrefix
listRecords.append(resumptionToken_element)
root.append(listRecords)
return ET.tostring(root, pretty_print=True, xml_declaration=True,
encoding='utf-8').decode()
if querystring['verb'].lower() == 'listrecords':
logging.info("Listing records...")
if 'resumptiontoken' in querystring:
if 'metadataprefix' in querystring:
raise ErrorHandler(ErrorCode.BAD_ARGUMENT, event)
else:
return OaiPmh.listRecords(
metadataPrefix,
resumptionToken,
querystring.get('from', None),
querystring.get('until', None))
elif 'metadataprefix' in querystring:
return OaiPmh.listRecords(
querystring['metadataprefix'],
querystring.get('resumptiontoken', 0),
querystring.get('from', None),
querystring.get('until', None))
else:
raise ErrorHandler(ErrorCode.BAD_ARGUMENT, event)
It seems that this is a problem with the OAI server in question and not a problem with Sickle, so I'm closing this.
I'm not sure if this is a problem with a particular OAI endpoint I am working with, or with Sickle (although I'm leaning towards the former). I'm trying to selectively harvest an endpoint using an
until
timestamp:When I run this I see:
The timestamps for the records clearly show that the server isn't respecting the
until
value as it uses theresumptionToken
. But I noticed that if I manually craft a URL that includesuntil
with theresumptionToken
that it seems to work properly, since it returns the next 10 records in the set of 52?https://api.qdl.qa/api/oaipmh?resumptionToken=10mods_no_ocr&verb=ListRecords&until=2019-10-15T19%3A00%3A00Z
My understanding from the specification is that calls to
ListRecords
with theresumptionToken
shouldn't includeuntil
becauseresumptionToken
is exclusive? So it appears that Sickle is behaving properly and the server is broken?Any help confirming this conclusion would be greatly appreciated.
PS. Thank you for a rock solid and extensible OAI-PMH library!