Closed christinklez closed 3 months ago
It looks like these objects have a nuxeo type of SampleCustomPicture
, but the content file is a PDF. Based on the nuxeo type, the content harvester is expecting to get an image file to convert to jp2. However, it encounters the PDF and so throws an error. I think we've had this issue with nuxeo objects before -- do you remember how we resolved it? We could have the contributor re-create these objects as images.
Thanks @barbarahui!
Nuxeo project folder: https://nuxeo.cdlib.org/nuxeo/nxdoc/default/a2dcac48-b8fb-453a-9c21-88603a24da7f/view_documents
CSphere only has the 28 records published on production. All 43 Nuxeo records were created/modified ~April/June 2023. It looks like CSphere may have last harvested ~Nov 2023.
I suspect the legacy harvester possibly skipped these "image" records that have PDFs as their main file?
Removing the bug
label from this issue. This is a Nuxeo data entry/creation issue.
@aturner let's discuss how we should approach this!
@christinklez @barbarahui -- ah, the Nuxeo object doc type wasn't correctly set, at the time the PDFs were imported (should be "File" doc type). We've run into this issue before, and my understanding is there's no way to retroactively change the Nuxeo doc type -- the object needs to be rebuilt.
Christine, I can relay the info. to Sine at UCB Ethnic Studies Library, requesting to rebuild the objects -- from the results view, the PDF objects can be sussed out (for rebuilding): https://nuxeo.cdlib.org/nuxeo/nxpath/default/asset-library/UCB/UCB%20Ethnic%20Studies/CES/TWLF%2050th%20Anniversary%20Digital%20Scans@view_documents?tabIds=%3A&conversationId=0NXMAIN6
FD ticket -- message to Sine at UCB Ethnic Studies Library: https://help.oac.cdlib.org/a/tickets/137436
UCB has moved these objects into a "do not publish" folder. This collection is now harvested through!
Closing this issue as resolved. When UCB updates these objects, they will send over a harvesting request / update.
Mapper: Nuxeo Collection ID: 28042
Run ID: manual2024-04-23T23:37:41+00:00 Permalink to the log: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/log?dag_id=harvest_collection&task_id=content_harvesting.content_harvest&execution_date=2024-04-23T23%3A37%3A41%2B00%3A00&map_index=0 Link to the gridview: https://7a8067cb-3b99-477e-a883-7e311175a9b4.c3.us-west-2.airflow.amazonaws.com/dags/harvest_collection/grid?dag_run_id=manual2024-04-23T23%3A37%3A41%2B00%3A00&task_id=content_harvesting.content_harvest&tab=mapped_tasks&num_runs=365&map_index=0