ucldc / rikolti

calisphere harvester 2.0
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

[bug] 400 Client Error when fetching Nuxeo collection #1000

Closed gamontoya closed 5 months ago

gamontoya commented 5 months ago

When re-harvesting a recently, successfully harvested Nuxeo collection, it fails at the fetch_collection phase:

[2024-06-11, 23:39:06 UTC] {{standard_task_runner.py:104}} ERROR - Failed to execute job 140492 for task fetching.fetch_collection (400 Client Error: for url: https://nuxeo.cdlib.org/Nuxeo/site/api/v1/search/lang/NXQL/execute?pageSize=100&currentPageIndex=0&query=SELECT+%2A+FROM+SampleCustomPicture%2C+CustomFile%2C+CustomVideo%2C+CustomAudio%2C+CustomThreeD+WHERE+ecm%3Apath+STARTSWITH+%27%2Fasset-library%2FUCB%2FUCB+Ethnic+Studies%2FAAS%2FWei+Min+She+and+Asian+Community+Center+photographs%2C+1970-1980%2FAsian+Community+Center+Children%27s+Program%27+AND+ecm%3AisTrashed+%3D+0+ORDER+BY+ecm%3Apos; 11642)

christinklez commented 5 months ago

Perhaps due to strange characters in the link. Explore changing the URL to use UUIDs. (May have quotes in the title.)

barbarahui commented 5 months ago

This should be fixed with this PR: https://github.com/ucldc/rikolti/pull/1011

The problematic path has a single quote in it. Querying by ecm:ancestorId rather than ecm:path gets around this problem.

christinklez commented 5 months ago

Thanks so much, @barbarahui! I ran a harvest and it's successfully finished through. Thank you!