project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Semantic facet optimization idea: estimate for all possible facet values #365

Open brent-hartwig opened 1 week ago

brent-hartwig commented 1 week ago

Problem Description: Copied from #362

Semantic facet requests can timeout, even after the 59 seconds they are allotted. Unknown how likely users would use these facets should they take more than n seconds to appear.

An extreme example takes no less than 4.4 minutes: the responsible collections facet for an estimated 15 million items curated by Yale University Library (YUL) --a boil the ocean search that LUX is asked to calculate facets for. Using the techniques described below, this facet request completed in 10 seconds.

Expected Behavior/Solution: For the two semantic facets we have today (responsible units and collections):

  1. Start with all possible facet values. Collections is the largest with 64.
  2. Incorporating the user's search criteria, calculate the estimate for each of the facet values.
  3. Filter out facet values that have an estimate of zero values.

Results for the extreme example using the above technique:

11.3.0-q17-cts-speed-demon firstRun=2815 warmRuns=10 warmMin=238 warmMax=973 warmAvg=332 stddev=215 totalItemsRead=198

All durations are in milliseconds, meaning the slowest it becomes to calculate the responsible collections facet for ~15 million items curated by YUL is 2.815 seconds 💥

Results for the responsible collections facet for 675K items matching the "journals" keyword search:

11.3.0-q17-cts-speed-demon firstRun=1684 warmRuns=10 warmMin=149 warmMax=166 warmAvg=154 stddev=5 totalItemsRead=572

Yep, 1.7 seconds with empty caches :)

Once implemented, I don't believe a semantic facet request would ever time out.

Implementation, within the context of the CTS benchmark template: q17-cts-speed-demon.js.txt. CTS queries for both examples are included. Set baseSearchCriteria to journalsBaseSearchCriteria or curatedByYulBaseSearchCriteria.

To implement within LUX, we'd need to update the search criteria within facetsViaSearchConfig.mjs and the approach within facetsLib.mjs.

Requirements: See above.

Needed for promotion: If an item on the list is not needed, it should be crossed off but not removed.

~- [ ] Wireframe/Mockup - Mike~

UAT/LUX Examples:

The following are searches. If you are testing this ticket, monitor how long it takes for the responsible units and collections facets to appear (before and after the implementation).

Dependencies/Blocks:

This ticket neither blocks nor is blocked by another ticket.

Related Github Issues:

Related links:

Wireframe/Mockup:

N/A

brent-hartwig commented 5 days ago

Updated q17-cts-speed-demon.js.txt (above) after having compared the performance of the universal index and a temporary field for the collection set criteria. In this case, the universal index proved faster. The updated version of the query allows you to try either, so long as the field exists. The field's path is /json[type = "Set" and classified_as/equivalent/id = "http://vocab.getty.edu/aat/300025976"]/id.

Comparison:

11.3.0-q17-cts-universal-index firstRun=2422 warmRuns=10 warmMin=237 warmMax=289 warmAvg=256 stddev=18 totalItemsRead=198

11.3.0-q17-cts-field-range-index firstRun=2510 warmRuns=10 warmMin=301 warmMax=716 warmAvg=352 stddev=122 totalItemsRead=198