pulibrary / pulfalight

This is an implementation of the Princeton University Library Finding Aids (PULFA) service using ArcLight
Other
7 stars 1 forks source link

Searching within a collection by Box number not working #1201

Open ccleeton opened 1 year ago

ccleeton commented 1 year ago

From Charles Doran

searching by box and folder number. For example, if I search for box 40 in C0187, I get all the folders numbered 40 in all boxes for this collection. Additionally, if I search for Box 40, folder 63, I get no results.

I tried this in a few other collections and had similar results.

I am attaching sudden priority to this request as it was a function that did work previously and it is very needed when looking to request a specific box.

ccleeton commented 1 year ago

This search function was used successfully in the past week or two. Not sure when it stopped working.

tpendragon commented 1 year ago

A little diagnosing:

We create a location_info_tesim field to enable searching by box/folder via keyword search. In that field for the record we want back is "C0187 Box 40, Folder 1-82, Box 41, Folder 1-102, Box 42, Folder 1-102, Box 43, Folder 1-112, Box 44, Folder 1-102, Box 45, Folder 1-17"

This won't match on Box 40, Folder 63 because that string's not there. I think we'll have to build a better box/folder field that contains multiple entries like "C0187 Box 40, Folder 1", "C0187 Box 40, Folder 2" and boost it appropriately.

For some reason "Box 40" is being marked as less relevant than "Folder 40" - I don't know what that's about, maybe we need to boost the pf2 field? We'll have to test, probably by adding https://findingaids.princeton.edu/catalog/C0187_c67449-03877 and https://findingaids.princeton.edu/catalog/C0187_c68425-07869 to fixtures and making sure it returns right when searching for "Box 40"

tpendragon commented 1 year ago

@ccleeton @faithc Through diagnosis I think I've determined that nothing's changed - this has been this way since we've launched it. I agree we need to fix this for "Folder 1-X," but could you provide some examples here about how this is stopping y'all from being able to do your work? There's times you've been unable to get the job done?

We've got a lot on our plate right now with this preservation work, some Ruby upgrades, and upcoming holidays so I'm tempted to push this to the next Pulfalight cycle, but I want to sympathize with the struggles of y'all.

Some examples that do work:

I also think this ticket could take two routes. One is "If I search box 9 I ONLY want to see those things that are Box 9", which would be quite a difficult ticket, and another is "If I search Box 9 Folder 4 it should return something that's Box 9 Folder 1-6 as the first result", which is a little easier, but a significant implementation that will require a full reindex.

What do you think?

hackartisan commented 1 year ago

relevant prior work: https://github.com/pulibrary/pulfalight/pull/522

ccleeton commented 1 year ago

Sorry for the delay. I think as long as it is working as it was before let's leave the ticket open and hopefully it can get worked on during a sprint. The need for searching by box directly at the very least is if a researcher or staff knows exactly what box something is in but not the file name.

I will ask one last question, is the reason that Box 40 doesn't show up in the search for C0187 is that it is listed as a range of boxes? - https://findingaids.princeton.edu/catalog/C0187_c68425-07869

hackartisan commented 1 year ago

@ccleeton, @tpendragon says Yes, that's the reason.

Thank you!!

ccleeton commented 1 year ago

Ok so can we leave this ticket open in the hopes that it could search box ranges in the future.