project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Enable search to be unfiltered #223

Closed brent-hartwig closed 1 month ago

brent-hartwig commented 1 month ago

Problem Description: There's a good chance LUX can change from filtered searches to unfiltered searches without unacceptable false positive search results (results that the filtering process would have removed). Benefits:

  1. Search results will match the other unfiltered contexts: estimates and facets. If there is any disparity between those three, it is highly likely a bug in our code (e.g., #37).
  2. Unfiltered searches are faster --significantly at times, including with deep pagination (https://github.com/project-lux/lux-marklogic/issues/136#issuecomment-2103398285).
  3. Switching to unfiltered search results aligns with our ambitions to switch to Optic, which is unfiltered.

Expected Behavior/Solution: Endpoint consumers may specify whether to filter a search or not. A default will be provided. Initially, the default is to be filtered. The default is to be controlled by a new build property, enabling us to specify the default when deploying the backend's code.

When unfiltered, searches will become faster and may include results due to the following unfiltered search (indexing) limitations. Should we find other results that a filtered search excludes, we will need to investigate. We may find an indexing change is needed or that there is an additional limitation / nuance. Index changes are beyond the scope of this ticket.

Known unfiltered search limitations:

  1. Unfiltered searches cannot be whitespace-sensitive.
  2. Unfiltered searches cannot be punctuation-sensitive.
  3. Unfiltered searches cannot be case-sensitive when the word or phrase is all lowercase.

Examples of index changes we may need to make:

  1. Unfiltered searches can only be diacritic-sensitive when the “Fast Diacritic Sensitive Searches” index is enabled.
  2. Unfiltered searches can only exclude false positives from phrase and near queries so long as the associated word position index(es) is enabled.

Requirements:

Needed for promotion: If an item on the list is not needed, it should be crossed off but not removed.

~- [ ] Wireframe/Mockup - Mike~

UAT/LUX Examples: TODO: it would be nice to compile a list of searches that draw out differences between filtered and unfiltered search results.

Dependencies/Blocks: This ticket neither blocks nor is blocked by another ticket. Once implemented, it will enable the frontend and middle tier to implement a filtering option.

Related Github Issues: Links included above.

Related links: None

Wireframe/Mockup: N/A

brent-hartwig commented 1 month ago

Decided on 12 Jul with @jffcamp, @prowns, @azaroth42 to implement this but that search results are to remain filtered by default until a mid-July dataset is in both DEV and TST (or two DEV tenants?).

cc: @clarkepeterf, @roamye

brent-hartwig commented 1 month ago

Started work in the 223-unfiltered-cts-search branch.

brent-hartwig commented 1 month ago

Implemented in PR https://github.com/project-lux/lux-marklogic/pull/231, which was included in release1.21. CF PR is https://git.yale.edu/lux-its/ml-cluster-formation/pull/42.

brent-hartwig commented 1 month ago

Closing as this has reached PROD as part of v1.21.

brent-hartwig commented 1 month ago

On 1 Aug 24, Peter enabled unfiltered searches in DEV's main tenant.

jffcamp commented 1 month ago

Decision made to not test. Not able to identify tests where a difference can be surfaced. ML only applies filters when needed. So, we will allow filtering to remain on. Dev will be updated to re-enable filtering.

brent-hartwig commented 1 month ago

@jffcamp, were you looking for a difference in results or better performance? The intent was to speed up search under the presumption that filtering results is unnecessary and that should we encounter differences in results, we'd be able to either address via index change or explain as not covered by indexes. This is also intended to give the team more time to determine if unfiltered results work well for LUX, in advance of switching to Optic.

roamye commented 3 weeks ago

IT Team Meeting 08/16: It was discussed in today's meeting that further investigation is needed before making a decision. Such as understanding the impact and how to test.