scientist-softserv / oral-history

UCLA LIBRARY-CENTER FOR ORAL HISTORY RESEARCH --Documenting the histories of Los Angeles-- The UCLA Library creates a vibrant nexus of ideas, collections, expertise, and spaces in which users illuminate solutions for local and global challenges. We constantly evolve to advance UCLA’s research, education, and public service mission by empowering and
https://oralhistory.library.ucla.edu/
0 stars 0 forks source link

Some interviews not showing up in full text search #73 #17

Closed labradford closed 3 months ago

labradford commented 1 year ago

Interviews ingested after December 2020 do not appear in full text search, just interview information searches. For example, the interviews in this series are not showing up in full text search: 

https://oralhistory.library.ucla.edu/?f%5Bseries_facet%5D%5B%5D=Chemical+Entanglements%3A+Oral+Histories+of+Environmental+Illness

Example: Searching for Alfaro does not return any results in full text search but it appears in interview information search.
https://oralhistory.library.ucla.edu/catalog/21198-zz002kpkk3?counter=1&q=alfaro

crisr15 commented 1 year ago

Confirm with TKay if this is still active.

crisr15 commented 1 year ago

Confirmed this is still active and needs to be fixed.

aprilrieger commented 1 year ago

OralHistoryItem.import_single('21198-zz002kpkk3')

aprilrieger commented 1 year ago

We need:

item.attribute['transcript_json_t'] << { "transcript_t": transcript }

Once the IndexPdfJob occurs, or during the job.

In the job

result = SolrService.extract(path: tmp_file.path)
transcript = result['file'].to_s.strip
result = "file"=>"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInterview with Evelin Alfaro \n \n\nSESSION 1 (8/17/2020) \n \n\nTimed Log \n \n[00:00:00] Introduction and gives her permission to record the interview. Says she is a \n\nmember of United and Active Women (MUA) and the California Domestic \nWorkers Coalition. She is forty years old. \n\n[00:02:16] Born in Quetzaltenango, Guatemala to a family of seven children. Her dad \nis a carpenter and her mom a stay-at-home mom. They also cultivate corn. \nShe lives close to a forest and a river. She arrives in San Francisco in 2009. \n\n[00:06:57] Wants to talk about the importance of domestic work and she also wants to \nlearn about the study because there is contamination and dangers in her \nhome country as well. For example, they use chemicals on their corn in \nGuatemala without protections like gloves or masks. \n\n[00:09:06] When she was a kid, the corn products irritated her hands – and the smell \nalso irritated her nose. She did not speak to her siblings about the effects. \n\n[00:12:36] Her dad had to use toxic products when he painted houses, and they were \neven worse when it was hot out. She speculates perhaps they affected him. \n\n[00:16:12] Learns to take precautions when cleaning houses like not mixing Clorox \nwith Ajax, opening windows, etc. to avoid reactions. She doesn’t go to the \ndoctor because she doesn’t want to miss a day of work and because she \ncannot pay. \n\n[00:23:06] Continues to work with some toxic products, but there are also some clients \nwho accept vinegar. Symptoms include skin irritation and red eyes. She \nbuys her own protective equipment. \n\n[00:28:53] She likes MUA because they share information and make workers feel more \nsecure about the products they’re using. She declares it is a type of power to \nbe able to participate, receive training, and spread awareness. \n\n[00:33:20] A leader in the campaign for SB 1257. She especially likes labor rights. She \nshares her testimony with legislators. She feels proud of her work and her \nefforts informing about and expanding the rights of domestic workers. \n\n[00:38:22] Speaks about the connection between domestic work and domestic violence \nbecause oftentimes workers have to spend time in private homes. \n\n[00:39:48] Talks about the connection between working-class jobs and immigration \nstatus. Without papers, they do not feel the freedom to say look, I am not \ngoing to do this because it puts my life at risk. \n\n[00:41:08] Lack of protections during the pandemic. In general, she emphasizes the \nneed to take care of domestic workers the same as those in any other type of \njob. \n\n\n\n[00:49:38] Final words, logistics, and conclusion. Thank you very much for your time. \n\n \n\n\n\n"

After item.index_record

I follow the item and see that the information has not been successfully indexed into solr.

  def index_record
    SolrService.add(self.to_solr)
    #TODO allow for search capturing
    SolrService.commit
  end
DiemBTran commented 1 year ago

blocked for QA until we hear back from John re http basic auth env variables getting set in the deploy

DiemBTran commented 1 year ago

Needs further review:

tested on:

  1. I let the Run Importer and Delete Removed Entries run overnight on staging (screenshot)
  2. I did a full text search for a term that is found on both the Evelin Alfaro interview show page and its PDF transcript (“California Domestic Workers Coalition”), screenshot
  3. That search did not return the interview we’re looking for
    1. screenshot of the search results page for the term
    2. screenshot of the search results page for the term PLUS limiting the search by the series it’s in (“Chemical Entanglements: Oral Histories of Environmental Illness”)
labradford commented 1 year ago

The IndexPdfTranscriptJob is failing on the test server https://oralhistory-test.library.ucla.edu/delayed_job/failed

aprilrieger commented 7 months ago

This is live on the site.