Research: Tab results do not appear when "&" is used in a simple search.

roamye commented 2 months ago

Problem Description: Lux is supposed to ignore punctuation when a user submits a search. However, when a user uses the '&' in a simple search an error will occur where the results in each scope are non existent. This issue serves as a research ticket to figure out why this bug happens and what a potential solution could be.

Expected Behavior/Solution: Research issue on why this happens and what a possible solution could be.

Requirements: TBD based on proposed solution.

Needed for promotion: If an item on the list is not needed, it should be crossed off but not removed.

[ ] Wireframe/Mockup - Mike
[ ] Committee discussions - Sarah
[ ] Feasibility/Team discussion - Sarah
[ ] Backend requirements - TBD
[ ] Frontend requirements- TBD
[ ] Questions
List of questions for discussions. Answers should be documented within the issue.

UAT/LUX Examples:

Dependencies/Blocks:

Blocked By: Issues that are blocking the completion of the current issue.
Blocking: Issues being blocked by the completion of the current issue.

Related Github Issues:

Issues that contain similar work but are not blocking or being blocked by the current issue.

Related links:

BugHerd: https://www.bugherd.com/projects/284041/tasks/2153

Wireframe/Mockup: Place wireframe/mockup for the proposed solution at end of ticket.

brent-hartwig commented 2 months ago

@kamerynB and @clarkepeterf, I suspect the issue has to do with encoding the search criteria in the frontend's calls to /api/search-estimate/[scope]?q=[criteria], specifically when there is an ampersand in the search criteria. If not encoded, the q parameter value gets truncated and perhaps the middle tier is dropping the perceived following parameter when passing the request onto the backend.

Take the search for Japanese & American for example. The search URL is properly encoded:

https://lux.collections.yale.edu/view/results/objects/?q=%7B%22AND%22%3A%5B%7B%22text%22%3A%22Japanese%22%2C%22_lang%22%3A%22en%22%7D%2C%7B%22text%22%3A%22%26%22%2C%22_lang%22%3A%22en%22%7D%2C%7B%22text%22%3A%22America%22%2C%22_lang%22%3A%22en%22%7D%5D%7D&sq=Japanese+%26+America

Whereas the search estimate requests are not:

https://lux.collections.yale.edu/api/search-estimate/item?q={%22AND%22:[{%22text%22:%22Japanese%22,%22_lang%22:%22en%22},{%22text%22:%22&%22,%22_lang%22:%22en%22},{%22text%22:%22America%22,%22_lang%22:%22en%22}]}

Arguably, the backend's translate endpoint should do this; however, the frontend must be doing something to make the search request work. Please advise how you would like this addressed. Thank you.

clarkepeterf commented 2 months ago

@brent-hartwig Yes, looks like the frontend is properly encoding the URI components for search but not for search-estimate. That should be fixed. But if punctuation is to be ignored, does that mean we should remove punctuation on the frontend before sending to ML? Or do we need to update the logic in ML to remove punctuation? I'd lean towards doing the removal of punctuation in ML

brent-hartwig commented 2 months ago

@clarkepeterf, at present, the backend strips out punctuation-only search terms while converting the search criteria into the query, specifically here. We could refactor and have the translate endpoint also do this, but for backend consumers that do not go through the translate endpoint, we'll need to have the current base covered too.

azaroth42 commented 1 month ago

Boo! I have demo where the main culprit is "McKinsey & Company" :(

brent-hartwig commented 1 month ago

@azaroth42, which environment would you need this fixed in before your demo?

@clarkepeterf or @kamerynB, based on Rob's answer, please consideration options on making the following edit, consulting @prowns and @jffcamp as needed. Thank you.

Yes, looks like the frontend is properly encoding the URI components for search but not for search-estimate. That should be fixed.

project-lux / lux-marklogic

Research: Tab results do not appear when "&" is used in a simple search. #111