scientist-softserv / louisville-hyku

Other
0 stars 0 forks source link

Leader not searching searchable text #173

Open crisr15 opened 1 year ago

crisr15 commented 1 year ago

Summary

Leader (see https://hyku-docker-dev.library.louisville.edu/collections/ulua_leader?locale=en) not searching the searchable_text field.

  1. In CONTENTdm, 16 issues of the Leader come up with a search for the Ballard Chefs (see BallardChefs_CDM.jpg).

  2. In Hyku, only 2 issues of the Leader come up with the same catalog search (see BallardChefs_Hyku.jpg).

  3. When I retype the search terms into the search box on the viewer for one of those two issues, I get 2 results. One is an ad (which had not been transcribed in our crowdsourcing project), and the keyword is highlighted. When I search the page to pick up the searchable text metadata (which does appear in the viewer), I get zero results (see BallardChefs_Ad.jpg).

  4. When I search for an issue that appeared in those CONTENTdm results but not in the Hyku results, the viewer search does pick up some text, although it’s not highlighted and it did not show up in catalog search (see BallardChefs_19300426.jpg). This text had also been transcribed, so page search also highlights it in the searchable text metadata field.

For this collection, due to the transcription project, we had requested catalog search to use the searchable_text field, despite the fact that we’d be sacrificing the highlighting of OCRed text within the viewer. Not only is the catalog search not using the searchable_text field, it doesn’t seem to be making full use of the OCRed text either.

Screenshots

CLICK ME Screenshot ![Image](https://user-images.githubusercontent.com/11359350/217850323-c4d3aa5f-40f6-4164-a5e2-d27d064a1449.jpg) ![Image](https://user-images.githubusercontent.com/11359350/217850384-e64f7c13-be38-46ea-9398-a5eb9f888ac3.jpg) ![Image](https://user-images.githubusercontent.com/11359350/217850620-abb266fd-a482-4afb-9a6b-4db5fba9f6aa.jpg) ![Image](https://user-images.githubusercontent.com/11359350/217850699-c5be1bc0-8972-45d1-944e-ab2712d7eb39.jpg)

Files

https://drive.google.com/drive/folders/1UteJ1xOLvA1Dy0tlnf1xeqHbnNM8Y-LS?usp=share_link

Acceptance Criteria

Testing Instructions

Testing instructions: Signed in as an admin:

As a user: go to: https://louisville-hyku-staging.notch8.cloud/collections/ulua_leader?locale=en

If you want to test a fresh import you can use this subset of leader_collection:

Archive.zip

rachelihoward commented 1 year ago

Sample files and CSV re-shared via Dropbox - please confirm receipt so they can be deleted or shared via another method!

aprilrieger commented 1 year ago

Added notes in out team chat: https://assaydepot.slack.com/archives/C03B6A93E31/p1679708011289029

rachelihoward commented 1 year ago

April - I can't see that message. Crystal said you need login information to staging - if you mean our docker-dev instance, it's testing/testing.

aprilrieger commented 1 year ago

Hi @rachelihoward yes, https://hyku-docker-dev.library.louisville.edu/collections/ulua_leader?locale=en, thank you I am in!

rachelihoward commented 1 year ago

It's testing/testing.

Thanks for clarifying!

Rachel

From: April Rieger @.> Sent: Tuesday, March 28, 2023 3:44 PM To: scientist-softserv/louisville-hyku @.> Cc: Howard, Rachel @.>; Mention @.> Subject: Re: [scientist-softserv/louisville-hyku] Leader not searching searchable text (Issue #173)

You don't often get email from @.**@.>. Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification CAUTION: This email originated from outside of our organization. Do not click links, open attachments, or respond unless you recognize the sender's email address and know the contents are safe.

Hi @rachelihowardhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frachelihoward&data=05%7C01%7Crachel.howard%40louisville.edu%7C2a851906544d4c5e6d2908db2fc4bf52%7Cdd246e4a54344e158ae391ad9797b209%7C0%7C0%7C638156294301656661%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TxOkQ3CZxQZSs5Q7fCDt6SRdacQ7FCfZ5TX8gZIOemI%3D&reserved=0 yes, https://hyku-docker-dev.library.louisville.edu/collections/ulua_leader?locale=en, is behind basic http auth with a username/pw combo I do not have on file.

[Image]https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F63515648%2F228349628-010298fa-694a-4e34-a851-1d061b7629cf.png&data=05%7C01%7Crachel.howard%40louisville.edu%7C2a851906544d4c5e6d2908db2fc4bf52%7Cdd246e4a54344e158ae391ad9797b209%7C0%7C0%7C638156294301656661%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KBZMEZ0CALIeH7o6FIPXvKviA7%2FtAQPfIuM%2FLEMJMi4%3D&reserved=0

Thank you!

- Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fscientist-softserv%2Flouisville-hyku%2Fissues%2F173%23issuecomment-1487499837&data=05%7C01%7Crachel.howard%40louisville.edu%7C2a851906544d4c5e6d2908db2fc4bf52%7Cdd246e4a54344e158ae391ad9797b209%7C0%7C0%7C638156294301656661%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cqnuE0AfOnZYLey14tALz1kiCylCnIHBHJCDrWgztOU%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FA5TKJLGW6TCKZO35NGA7433W6M5PFANCNFSM6AAAAAAUWVD4FY&data=05%7C01%7Crachel.howard%40louisville.edu%7C2a851906544d4c5e6d2908db2fc4bf52%7Cdd246e4a54344e158ae391ad9797b209%7C0%7C0%7C638156294301656661%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=F%2FuHjqzBt2J1xuFXsTb9m9teXFy47xAzQrqu8e8gVRE%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.**@.>>

aprilrieger commented 1 year ago

Adding some notes about our investigation and analysis

  1. Figure out what "child" class is (Text)
  2. See if child has OCR text on it when imported (It should) 2a. See if this text is searchable elsewhere in the app (e.g. catalog search) (Not in the catalog search or advanced search)
  3. Remove unnecessary files/works/texts/etc & took searcahable_text off parent work and put on child work and now will search
  4. Take OCR text from child, put it on parent, see if it is then searchable 4a. Two options: i) Put child_obj.searchable_text into parent_obj.searchable text. Save, triggers index, goes to Solr doc ii) Skip searchable_text, put straight into parent solr doc

Structure of data: Collection --> Texts --> Texts with FileSets and searchable_text

Findings:

Conclusions:

Assumed intended behavior: When I search for "Quinn church" (given the proper set of data), I expect to see in the results, the parent Text. The parent Text should not have the searchable text itself, but rather its child Text's searchable text contains the search term

Implementation options:

  1. Change search behavior to include ability to search "through" a parent to look at its children's seachable_text. This means the parent Text will be returned in the search results when a hit is found on its child (even if the parent doesn't have the metadata itself) 1a. Kiah is pretty sure this is how the PALs implementation works
  2. Take the searchable_text from all the child Texts and put it in their parent's searchable_text 2a. This would necessitate some UI/UX improvements; specifically, truncating the Searchable Text field in the search results (catalog index page) (or remove the field all together)
    parent_obj.searchable_text = child_obj.searchable_text
    parent_obj.save!

Approach 1: Change search behavior to include ability to search "through" a parent to look at its children's searchable_text.

Advantages:

Disadvantages:

Approach 2: Take the searchable_text from all the child Texts and put it in their parent's searchable_text.

Advantages:

Disadvantages:

rachelihoward commented 1 year ago

Checking email from the airport on a phone but wanted to quickly clarify that the intent was to search searchable_text on the child (page). Each page was transcribed and we want users to get to issue (parent) when searching for any word on one of its pages (child searchable_text). I believe when it was functioning before, users then had to redo the search to navigate to the page. It would not be any more user-friendly to load all searchable_text to the parent - users would still have to navigate through all of the pages to find the one with the text. We can train super users to redo the search. We’d like it to work as it was before.

Get Outlook for iOShttps://aka.ms/o0ukef


From: April Rieger @.> Sent: Friday, March 31, 2023 6:42:54 PM To: scientist-softserv/louisville-hyku @.> Cc: Howard, Rachel @.>; Mention @.> Subject: Re: [scientist-softserv/louisville-hyku] Leader not searching searchable text (Issue #173)

CAUTION: This email originated from outside of our organization. Do not click links, open attachments, or respond unless you recognize the sender's email address and know the contents are safe.

Adding some notes about our investigation and analysis

  1. Figure out what "child" class is (Text)
  2. See if child has OCR text on it when imported (It should) 2a. See if this text is searchable elsewhere in the app (e.g. catalog search) (Not in the catalog search or advanced search)
  3. Remove unnecessary files/works/texts/etc & took searcahable_text off parent work and put on child work and now will search
  4. Take OCR text from child, put it on parent, see if it is then searchable 4a. Two options: i) Put child_obj.searchable_text into parent_obj.searchable text. Save, triggers index, goes to Solr doc ii) Skip searchable_text, put straight into parent solr doc

Structure of data: Collection --> Texts --> Texts with FileSets and searchable_text

Findings:

Conclusions:

Assumed intended behavior: When I search for "Quinn church" (given the proper set of data), I expect to see in the results, the parent Text. The parent Text should not have the searchable text itself, but rather its child Text's searchable text contains the search term

Implementation options:

  1. Change search behavior to include ability to search "through" a parent to look at its children's seachable_text. This means the parent Text will be returned in the search results when a hit is found on its child (even if the parent doesn't have the metadata itself) 1a. Kiah is pretty sure this is how the PALs implementation works
  2. Take the searchable_text from all the child Texts and put it in their parent's searchable_text 2a. This would necessitate some UI/UX improvements; specifically, truncating the Searchable Text field in the search results (catalog index page) (or remove the field all together)

parent_obj.searchable_text = child_obj.searchable_text parent_obj.save!

Approach 1: Change search behavior to include ability to search "through" a parent to look at its children's searchable_text.

Advantages:

Disadvantages:

Approach 2: Take the searchable_text from all the child Texts and put it in their parent's searchable_text.

Advantages:

Disadvantages:

— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fscientist-softserv%2Flouisville-hyku%2Fissues%2F173%23issuecomment-1492689932&data=05%7C01%7Crachel.howard%40louisville.edu%7C4967c95d48bf4e01802508db32394506%7Cdd246e4a54344e158ae391ad9797b209%7C0%7C0%7C638158993783679224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EWovr8D644Z233u%2FoDY6dyXEByu4q6%2BDMTr9D%2FE29nk%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FA5TKJLH5KU366RE2PIVLCB3W65MW5ANCNFSM6AAAAAAUWVD4FY&data=05%7C01%7Crachel.howard%40louisville.edu%7C4967c95d48bf4e01802508db32394506%7Cdd246e4a54344e158ae391ad9797b209%7C0%7C0%7C638158993783679224%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=W%2B1n1BBW7Ac88%2FjDUTcqd8%2FkIy5Lo7bY9DmyNLfp6JE%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

rachelihoward commented 1 year ago

Now that I'm not trying to read on my phone, I think we're on the same page, and that Approach 1 is what we want.

Thanks,

Rachel

From: Howard, Rachel @.> Sent: Friday, March 31, 2023 7:42 PM To: scientist-softserv/louisville-hyku @.>; scientist-softserv/louisville-hyku @.> Cc: Mention @.> Subject: Re: [scientist-softserv/louisville-hyku] Leader not searching searchable text (Issue #173)

Checking email from the airport on a phone but wanted to quickly clarify that the intent was to search searchable_text on the child (page). Each page was transcribed and we want users to get to issue (parent) when searching for any word on one of its pages (child searchable_text). I believe when it was functioning before, users then had to redo the search to navigate to the page. It would not be any more user-friendly to load all searchable_text to the parent - users would still have to navigate through all of the pages to find the one with the text. We can train super users to redo the search. We'd like it to work as it was before.

Get Outlook for iOShttps://aka.ms/o0ukef


From: April Rieger @.**@.>> Sent: Friday, March 31, 2023 6:42:54 PM To: scientist-softserv/louisville-hyku @.**@.>> Cc: Howard, Rachel @.**@.>>; Mention @.**@.>> Subject: Re: [scientist-softserv/louisville-hyku] Leader not searching searchable text (Issue #173)

CAUTION: This email originated from outside of our organization. Do not click links, open attachments, or respond unless you recognize the sender's email address and know the contents are safe.

Adding some notes about our investigation and analysis

  1. Figure out what "child" class is (Text)
  2. See if child has OCR text on it when imported (It should) 2a. See if this text is searchable elsewhere in the app (e.g. catalog search) (Not in the catalog search or advanced search)
  3. Remove unnecessary files/works/texts/etc & took searcahable_text off parent work and put on child work and now will search
  4. Take OCR text from child, put it on parent, see if it is then searchable 4a. Two options: i) Put child_obj.searchable_text into parent_obj.searchable text. Save, triggers index, goes to Solr doc ii) Skip searchable_text, put straight into parent solr doc

Structure of data: Collection --> Texts --> Texts with FileSets and searchable_text

Findings:

Conclusions:

Assumed intended behavior: When I search for "Quinn church" (given the proper set of data), I expect to see in the results, the parent Text. The parent Text should not have the searchable text itself, but rather its child Text's searchable text contains the search term

Implementation options:

  1. Change search behavior to include ability to search "through" a parent to look at its children's seachable_text. This means the parent Text will be returned in the search results when a hit is found on its child (even if the parent doesn't have the metadata itself) 1a. Kiah is pretty sure this is how the PALs implementation works
  2. Take the searchable_text from all the child Texts and put it in their parent's searchable_text 2a. This would necessitate some UI/UX improvements; specifically, truncating the Searchable Text field in the search results (catalog index page) (or remove the field all together)

parent_obj.searchable_text = child_obj.searchable_text

parent_obj.save!

Approach 1: Change search behavior to include ability to search "through" a parent to look at its children's searchable_text.

Advantages:

Disadvantages:

Approach 2: Take the searchable_text from all the child Texts and put it in their parent's searchable_text.

Advantages:

Disadvantages:

- Reply to this email directly, view it on GitHubhttps://github.com/scientist-softserv/louisville-hyku/issues/173#issuecomment-1492689932, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5TKJLH5KU366RE2PIVLCB3W65MW5ANCNFSM6AAAAAAUWVD4FY. You are receiving this because you were mentioned.Message ID: @.**@.>>

aprilrieger commented 1 year ago

@rachelihoward When you are ready this ticket is ready for review as well as #174

rachelihoward commented 1 year ago

The search did work but the viewer on your staging is different than ours so it's hard to gauge whether it's actually working as expected. On our viewer, the child-level metadata (including full text) appears at the right, and the images of each child appear at the left, with blue markers helping navigate to the appropriate child when searching within the parent.

rachelihoward commented 1 year ago

This is what my search brings up on your staging.

Image

rachelihoward commented 1 year ago

Here's the viewer on our site.

Image

rkuehn-uofl commented 1 year ago

UofL UV Theme

aprilrieger commented 1 year ago

@rkuehn-uofl How can I resolve this issue on our staging. We are seeing this behavior after merging your main into our main. Just linking doesn't help me understand next steps, can you define the steps I need to take to get your UV styles to show up in our staging?

rkuehn-uofl commented 1 year ago

@aprilrieger If you copy the uv-en-uofl-theme directory over to /louisville-hyku/public/uv/themes/ and the uv-config.json file to /louisville-hyku/public/uv/, you should be set.