project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Explore means to remove the per relation cap imposed on related lists (from 884) #30

Open gigamorph opened 4 months ago

gigamorph commented 4 months ago

Problem Description: As part of #871 and implemented in PR https://git.yale.edu/lux-its/marklogic/pull/883, a maximum number of relationships for each related list relation is imposed. The limit is seen as a performance and timeout workaround that negatively impacts functionality as data is omitted from some requests.

It's not yet clear how many requests the limit impacts (i.e., how many requests have relations that exceed the per relation cap). When enabled, a LuxRelatedList trace event will log each occurrence. Information is power! Example entries:

<logfile host="10.5.156.154" filename="8003_ErrorLog.txt" start="2023-04-12T15:45:00" end="" regex="" regex-flags="" xmlns="http://marklogic.com/manage/logs">
    <log timestamp="2023-04-12T15:45:35.970Z" level="info">[Event:id=LuxRelatedList] Hit the max of 250000 relationships for the 'classificationOfWork-classification' relation with scope 'concept', term 'relatedToConcept', and URI 'https://lux.collections.yale.edu/data/concept/7ff533c5-cb3a-4326-9f2c-8fff6a7ac54e'.</log>
    <log timestamp="2023-04-12T15:45:37.666Z" level="info">[Event:id=LuxRelatedList] Hit the max of 250000 relationships for the 'languageOf-classification' relation with scope 'concept', term 'relatedToConcept', and URI 'https://lux.collections.yale.edu/data/concept/7ff533c5-cb3a-4326-9f2c-8fff6a7ac54e'.</log>
    <log timestamp="2023-04-12T15:45:39.830Z" level="info">[Event:id=LuxRelatedList] Hit the max of 250000 relationships for the 'subjectOfConcept-classification' relation with scope 'concept', term 'relatedToConcept', and URI 'https://lux.collections.yale.edu/data/concept/7ff533c5-cb3a-4326-9f2c-8fff6a7ac54e'.</log>
    <log timestamp="2023-04-12T15:45:40.383Z" level="info">[Event:id=LuxRelatedList] Created the 'relatedToConcept' list in scope 'concept' for 'https://lux.collections.yale.edu/data/concept/7ff533c5-cb3a-4326-9f2c-8fff6a7ac54e' in 6255 milliseconds.</log>
</logfile>

Here's more about the restriction, which comes from the relatedList's endpoint documentation of its relationshipsPerRelation parameter:

The maximum number of relationships to process per relation. A related list's definition is comprised of multiple relations. Each may resolve to zero or more relationships. Some resolve to more than a million, potentially impacting performance to the extent the request times out. To avoid timeouts, a maximum number of relationships is applied per relation, meaning the maximum number of relationships processed per request is the maximum multiplied by the related list's number of relations. The per relation default is likely 250,000 but set by the relatedListPerRelationDefault build property. The maximum that cannot be exceeded is likely 500,000 but set by the relatedListPerRelationMax build property. If a value larger than the allowed maximum is specified, the request proceeds but the allowed maximum is applied

Expected Behavior/Solution: Dependent on research for removing the per relation cap.

If their is a solution based on research: Determine how to remove the per relation cap, and create a follow-up ticket to implement the actual code to remove it

If their is NOT a solution based on research: A means to remove the per relation cap is not discovered. Document findings in this ticket. No follow up ticket.

Requirements:

Needed for promotion: If an item on the list is not needed, it should be crossed off but not removed.

- [ ] Wireframe/Mockup - Heather - [ ] Committee discussions - Sarah

UAT/LUX Examples:

Dependencies/Blocks:

Related Github Issues:

Related links:

roamye commented 3 months ago

@clarkepeterf @prowns What is the expected behavior here? that once research is complete, the per relation cap will no longer be imposed on the related lists?

Additionally what are the requirements? Research is one req, but are there other things that need to be done besides this?

clarkepeterf commented 3 months ago

I think the expected outcome is either:

  1. Determine how to remove the per relation cap, and create a follow-up ticket to implement the actual code to remove it OR
  2. A means to remove the per relation cap is not discovered. Document findings in this ticket. No follow up ticket.
brent-hartwig commented 3 months ago

@jffcamp, I wanted to find out how often this is happening in production, which may inform how aggressive we go after this on. The source of the following information is nine days of production logs (Green, 26 Mar - 3 Apr). Per the Green-Blue switch record, Green was PROD during this period.

I then came up with a query that allowed me to count the number of triples for each of these requests. Findings:

So what are we asking of the system? Each related list has 13 to 18 relations, excluding one outlier with five relations (Relation counts by related list are at the bottom of this comment). While processing a related list request, the triple store is accessed the number of times of the related list's relations (13 to 18 times, typically). If the request is for a moderately to highly related document, we're pulling millions of triples and then have custom code to group the triples by related entity, maintain counts by how each entity is related to another, sort the related entities by the number of times it is related to the specified entity, and finally paginate the results.

OK, so what options do we have? These are not necessarily mutually-exclusive.

  1. Increase the cap. Completely within our control but we can only go so high before encountering timeouts.
  2. Increase the timeout. Completely within our control. We already have a build property that controls the maximum allowed for a related list request but would have to bump up the app server timeout if not also that of the load balancer.
  3. Increase system resource levels. If interested enough or are able to piggyback another reason for testing in a larger environment, it would be interesting to see if the cap can be removed altogether.
  4. Pursue an optimization within the custom code. I have not reviewed the code since 2023 optimizations but did just ask a related question, internally: are there techniques that can increase parallel processing within the triples store for a single request.
  5. Request a product enhancement.
  6. To round options out, question if a configuration change at the MarkLogic or OS level can help.

Spreadsheet containing data from the logs: hit-max-green.xlsx. There are two tabs. @azaroth42 could probably provide some insights on the specific documents and their relations that hit the cap (2nd tab). The first tab includes the start of a broader log analysis: messages containing "XDMP" in the 8003, 8004, and main logs for the same period. "XDMP" starts error codes. I'll reach out separately to see if you want that investigated more.

Script used to get the triple count of a relation: getTripleCountOfOneRelation.js.txt

Number of relations per related list:

{
   "agent":{
      "relatedToAgent":13,
      "relatedToConcept":15,
      "relatedToEvent":5,
      "relatedToPlace":13
   },
   "concept":{
      "relatedToAgent":15,
      "relatedToConcept":18,
      "relatedToPlace":15
   },
   "place":{
      "relatedToAgent":13,
      "relatedToConcept":15,
      "relatedToPlace":13
   }
}

cc: @clarkepeterf, @prowns, @roamye