Open gigamorph opened 4 months ago
@clarkepeterf @prowns What is the expected behavior here? that once research is complete, the per relation cap will no longer be imposed on the related lists?
Additionally what are the requirements? Research is one req, but are there other things that need to be done besides this?
I think the expected outcome is either:
@jffcamp, I wanted to find out how often this is happening in production, which may inform how aggressive we go after this on. The source of the following information is nine days of production logs (Green, 26 Mar - 3 Apr). Per the Green-Blue switch record, Green was PROD during this period.
I then came up with a query that allowed me to count the number of triples for each of these requests. Findings:
So what are we asking of the system? Each related list has 13 to 18 relations, excluding one outlier with five relations (Relation counts by related list are at the bottom of this comment). While processing a related list request, the triple store is accessed the number of times of the related list's relations (13 to 18 times, typically). If the request is for a moderately to highly related document, we're pulling millions of triples and then have custom code to group the triples by related entity, maintain counts by how each entity is related to another, sort the related entities by the number of times it is related to the specified entity, and finally paginate the results.
OK, so what options do we have? These are not necessarily mutually-exclusive.
Spreadsheet containing data from the logs: hit-max-green.xlsx. There are two tabs. @azaroth42 could probably provide some insights on the specific documents and their relations that hit the cap (2nd tab). The first tab includes the start of a broader log analysis: messages containing "XDMP" in the 8003, 8004, and main logs for the same period. "XDMP" starts error codes. I'll reach out separately to see if you want that investigated more.
Script used to get the triple count of a relation: getTripleCountOfOneRelation.js.txt
Number of relations per related list:
{
"agent":{
"relatedToAgent":13,
"relatedToConcept":15,
"relatedToEvent":5,
"relatedToPlace":13
},
"concept":{
"relatedToAgent":15,
"relatedToConcept":18,
"relatedToPlace":15
},
"place":{
"relatedToAgent":13,
"relatedToConcept":15,
"relatedToPlace":13
}
}
cc: @clarkepeterf, @prowns, @roamye
Problem Description: As part of #871 and implemented in PR https://git.yale.edu/lux-its/marklogic/pull/883, a maximum number of relationships for each related list relation is imposed. The limit is seen as a performance and timeout workaround that negatively impacts functionality as data is omitted from some requests.
It's not yet clear how many requests the limit impacts (i.e., how many requests have relations that exceed the per relation cap). When enabled, a LuxRelatedList trace event will log each occurrence. Information is power! Example entries:
Here's more about the restriction, which comes from the relatedList's endpoint documentation of its
relationshipsPerRelation
parameter:Expected Behavior/Solution: Dependent on research for removing the per relation cap.
If their is a solution based on research: Determine how to remove the per relation cap, and create a follow-up ticket to implement the actual code to remove it
If their is NOT a solution based on research: A means to remove the per relation cap is not discovered. Document findings in this ticket. No follow up ticket.
Requirements:
Needed for promotion: If an item on the list is not needed, it should be crossed off but not removed.
- [ ] Wireframe/Mockup - Heather- [ ] Committee discussions - SarahUAT/LUX Examples:
Dependencies/Blocks:
Related Github Issues:
Related links: