ucldc / rikolti

calisphere harvester 2.0
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

[data provider issue] `'NoneType' object has no attribute 'findall'` #939

Closed gamontoya closed 5 months ago

gamontoya commented 6 months ago
christinklez commented 6 months ago

This is failing at the "fetch" task.

OAI: https://digital.library.ucla.edu/catalog/oai?verb=ListRecords&metadataPrefix=oai_dpla&set=member_of_collection_ids_ssim:2grzc200zz-89112

Digital collection: https://digital.library.ucla.edu/catalog?f%5Bmember_of_collections_ssim%5D%5B%5D=Fowler+Museum

barbarahui commented 6 months ago

The third page of OAI for this collection has no <ListRecords> element. The 2nd page has a resumption token, but for some reason the 3rd page doesn't contain any records and contains this noRecordsMatch error. This is a problem with their feed -- it shouldn't give us a resumption token if there isn't a subsequent page.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/assets/blacklight_oai_provider/oai2-f5bbeed678ad7f0e8aa64ed892344db51dbb4beca820d135ec51e711df82385e.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
    <responseDate>2024-05-14T22:46:11Z</responseDate>
    <request>https://ursus.library.ucla.edu/catalog/oai?verb=Identify</request>
    <error code="noRecordsMatch">The combination of the values of the from, until, set and metadataPrefix arguments results in an empty list.</error>
</OAI-PMH>

You can also see the error if you go to the endpoint we're harvesting from:

https://digital.library.ucla.edu/catalog/oai?verb=ListRecords&metadataPrefix=oai_dpla&set=member_of_collection_ids_ssim:2grzc200zz-89112

Click Resume at the bottom of the page to go to the 2nd and then once again to go to the 3rd page.

barbarahui commented 6 months ago

It would help identify the specific problem if we caught the errors in the OAI as per this issue: https://github.com/ucldc/rikolti/issues/660

christinklez commented 6 months ago

Thanks @barbarahui for figuring out the issue with the OAI! We'll reach out to the folks at UCLA to let them know what we observe. Thanks!

christinklez commented 5 months ago

Closing this issue. We've done our investigation, and the issue has been reported to UCLA. We will run another harvest when UCLA lets us know the issue has been resolved on their end.