scientist-softserv / oral-history

UCLA LIBRARY-CENTER FOR ORAL HISTORY RESEARCH --Documenting the histories of Los Angeles-- The UCLA Library creates a vibrant nexus of ideas, collections, expertise, and spaces in which users illuminate solutions for local and global challenges. We constantly evolve to advance UCLA’s research, education, and public service mission by empowering and
https://oralhistory.library.ucla.edu/
0 stars 0 forks source link

Non-indexing transcript PDFs and/or XML#60 #3

Open labradford opened 1 year ago

labradford commented 1 year ago

I'm trying to track down some functional inconsistencies and log errors. I have been seeing an issue trying to display the xml transcript for items.

Next to the session item (bottom half of page on right), if you click the 'play' icon, an xml box appears with the indexed transcript. In the first case below we see expected behavior, the 2nd does not.

Proper behavior: https://oralhistory.library.ucla.edu/catalog/21198-zz002kd7t9

Non-working behavior: https://oralhistory.library.ucla.edu/catalog/21198-zz002kpjm4 Javascript error: GET blob:https://oralhistory.library.ucla.edu/8c6c73a5-57a4-44c5-a22d-df6760328a7f net::ERR_FILE_NOT_FOUND

I suspect the xml is not getting indexed for some reason, are there any hard coded paths or quirks that might cause this issue?

Possibly related, but perhaps not, we have a decent amount of pdf indexing errors in the log, example:

job_class: IndexPdfTranscriptJob job_id: 5cf81f52-27ae-4db6-909e-e625f6e7826d provider_job_id: queue_name: default priority: arguments:

aprilrieger commented 1 year ago

This is solved with 17-index-pdf-transcript-metadata

aprilrieger commented 9 months ago

This needs rework, that will get accomplished during the oai feed update