project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Optimize related list requests made through the searchWillMatch endpoint #202

Open brent-hartwig opened 3 months ago

brent-hartwig commented 3 months ago

Problem Description: This is an optimization idea spawn from performance test #181 and copied from this comment. The optimization could render entity pages to users faster. In the performance test, some agent, concept, and place entity page requests took over two seconds and executed seven or more triple searches to find the first item in a related list.

We may be able to optimize searchWillMatch via changes to relatedListsLib.mjs's PRIORITY_BY_RELATION_KEY. That variable may be used to influence the order a related list's triple searches are executed in when only needing to know if a related list has at least one item. This capability is utilized in the frontend entity page context to determine whether to display a UI widget. Below is a formatted version of this test's atLeastOneRelatedListItem.txt (log mining output). Related lists without items are filtered out. Remaining rows are sorted by duration. While duration for the same request can vary based on concurrent demand, note the number of relations checked, in column B. In this context, the fewer the better.

Ultimately, this is about how fast an agent, concept, or place entity page begins to render for a user. Each associated searchWillMatch request includes criteria for four related lists, asking whether they have at least one item.

image

Entire file: 2024-06-18-perf-test-at-least-one-related-list-item.xlsx

Expected Behavior/Solution: Modify the following to enable searchWillMatch to resolve related list requests in fewer triple searches. Note that this is about prioritizing or deprioritizing specific triple searches based on how frequently or infrequently they are expected to return data for most records. The default priority is 3. Avoid focusing on one record per entity as triple searches for some entities may return results but that may not be common.

// Means to favor relations with fewer triples when determining if a related list will have at least one item.
// Relations you wish to process later should be given a number larger than zero (the default).  Original values
// are the number of seconds witnessed for highly referenced documents.  When only determining if a related list
// has at least one item, the higher the number, the later it will be processed, if at all.
const PRIORITY_BY_RELATION_KEY = {
  'classificationOfItem-classification': 4,
  'classificationOfWork-classification': 6,
  'created-classification': 9,
  'createdHere-classification': 8,
  'languageOf-classification': 2,
  'materialOfItem-classification': 3,
  'produced-classification': 6,
  'publishedHere-classification': 4,
  'subjectOfConcept-classification': 7,
  'usedToProduce-classification': 3,
};

Requirements: See above

Needed for promotion: If an item on the list is not needed, it should be crossed off but not removed.

~- [ ] Wireframe/Mockup - Mike~

UAT/LUX Examples: Developer would need to identify all impacted entities whose entity pages should be tested.

Dependencies/Blocks: None

Related Github Issues:

181

Related links: None

Wireframe/Mockup: Not needed

roamye commented 2 months ago

Per UAT - this needs to be discussed with @clarkepeterf / @brent-hartwig to figure out what else is needed for this to be promoted. Is this only about optimizing the back end query?

brent-hartwig commented 2 months ago

@roamye, this is an optimization idea spawned from a performance test finding. We wanted to log it so as not to forget. IMO, it's fine to toss this into the backlog while more important tasks are addressed.