Closed brent-hartwig closed 1 week ago
The specific backend branch being tested is 161-paginate-via-search-facet-requests. I believe it to be a branch of 160-paginate-non-semantic-facet-requests, which is a branch of release1.18.
In case this test is a raging success and we have time for a second test, we could re-run https://github.com/project-lux/lux-marklogic/issues/132: include alternative names when resolving name search criteria. I created the 161-pagination-PLUS-100-alternative-names branch which is the above plus #100's edits.
cc: @gigamorph, @kamerynB, @prowns, @jffcamp, @clarkepeterf
Middle tier stats: 20240607-tst-middle-tier-stats.txt
ML load balancer:
CPU utilization of ML nodes:
504s from ML load balancer:
504s from web cache:
Trimmed backend log files:
20240607-green-as-test-node-104-8003-AccessLog-trimmed.zip 20240607-green-as-test-node-104-8003-ErrorLog-trimmed.zip 20240607-green-as-test-node-104-8003-RequestLog-trimmed.zip 20240607-green-as-test-node-104-8004-AccessLog-trimmed.zip 20240607-green-as-test-node-104-8004-ErrorLog-trimmed.zip 20240607-green-as-test-node-104-8004-RequestLog-trimmed.zip 20240607-green-as-test-node-104-ErrorLog-trimmed.zip 20240607-green-as-test-node-73-8003-AccessLog-trimmed.zip 20240607-green-as-test-node-73-8003-ErrorLog-trimmed.zip 20240607-green-as-test-node-73-8003-RequestLog-trimmed.zip 20240607-green-as-test-node-73-8004-AccessLog-trimmed.zip 20240607-green-as-test-node-73-8004-ErrorLog-trimmed.zip 20240607-green-as-test-node-73-8004-RequestLog-trimmed.zip 20240607-green-as-test-node-73-ErrorLog-trimmed.zip 20240607-green-as-test-node-20-8003-AccessLog-trimmed.zip 20240607-green-as-test-node-20-8003-ErrorLog-trimmed.zip 20240607-green-as-test-node-20-8003-RequestLog-trimmed.zip 20240607-green-as-test-node-20-8004-AccessLog-trimmed.zip 20240607-green-as-test-node-20-8004-ErrorLog-trimmed.zip 20240607-green-as-test-node-20-8004-RequestLog-trimmed.zip 20240607-green-as-test-node-20-ErrorLog-trimmed.zip
There were fewer backend requests than expected --yet there are not significant differences between the total number of executed Virtual User (VU) flows/transactions, overall error rate, and average response times with this test's baseline (22 Jun 23) and a more recent test (14 Feb 24). Test duration doesn't explain it either. Comparison:
J (1033) | AG (1145) | AK | AK vs. J | AK vs. AG | |
---|---|---|---|---|---|
Date | 22 Jun 23 | 14 Feb 24 | 7 Jun 24 | N/A | N/A |
App server thread count(s) | 6 & 12 | 18 | 6 & 12 | N/A | N/A |
Backend search request count (200 status code) | 27,438 | 12,214 | 2,758 | -90% | -77% |
Backend facet request count (200 status code) | 82,687 | 63,367 | 32,348 | -61% | -49% |
Total backend request count (all status codes) | 1,016,344 | 688,599 | 307,511 | -70% | -55% |
Frontend response count with status code in the 200s | 1,046,953 | 1,046,953 | 830,098 | -21% | -21% |
Total frontend response count (all status codes) | 1,173,748 | 1,173,748 | 1,073,190 | -9% | -9% |
Backend requests per frontend response | 0.87 | 0.59 | 0.29 | -67% | -51% |
Frontend response per backend request | 1.15 | 1.70 | 3.49 | 202% | 105% |
Total middle tier request count (all status codes) | UNK | UNK | 363,788 | UNK | UNK |
Backend requests per middle tier request | UNK | UNK | 0.85 | UNK | UNK |
Middle tier requests per backend request | UNK | UNK | 1.18 | UNK | UNK |
Total number of executed VU flows | 28,573 | 36,729 | 32,230 | 3,657 | -4,499 |
Error rate | 2.42% | 5.42% | 4.41% | 1.99% | -1.01% |
Average, unweighted VU flow response time (secs) | 2.262 | 2.941 | 3.297 | 1.035 | 0.356 |
Test duration (HH:MM) | 1:55* | 2:12 | 2:15 | 00:20* | 00:03 |
If the system isn't being pushed as hard as previous tests, the results of this test could be skewed. Regardless, today's test was so much better than those on 9 May 24 (#132) and 22 May 24 (#151) as there were zero v8 engine crashes!
Possible explanations thought of to date (not mutually-exclusive):
* Logs were trimmed to 1:55. The QA report states the test ran for 2:15. Found a note that read "Test stopped after full ramp up but did not run for the full 15 minutes afterwards." Regardless, for the purposes of this comparison, 22 Jun 23's test running shorter would skew the results to the benefit of 7 Jun 24's test.
@gigamorph noticed early on that there were thousands of 504s (gateway timeouts) from the ML load balancer early on. As shown above, that persisted throughout the test but 504s at the web cache later never exceeded 100 per minute. The likely explanation is that the data service proxies automatically retried the 504s. Looking at this and previous tests' app server metrics, a contributing factor could be a full request queue, for the lux-request-group-1
app server. That app server's queue reported being full 82% of the minutely data points, implying it was maxed out for quite a bit of the test. Given the other app server's queue was not used at all, we should consider redirecting one or more request types to it.
Time period: 1700 - 1920 UTC
CPU:
File IO Detail:
Memory:
Intra-cluster activity, 1 of 2:
Intra-cluster activity, 2 of 2:
Both HTTP app servers:
Just the lux-request-group-1
app server (all requests here except search and relatedList requests, up to 6 threads per node):
Note the difference in the y
axes for lux-request-group-1
and lux-request-group-2
.
Just the lux-request-group-2
app server (only search and relatedList requests, up to 12 threads per node):
Note the difference in the y
axes for lux-request-group-1
and lux-request-group-2
.
Data node characteristics for the lux-content
database alone:
The following reflects group one's queue was full for the majority of the test. This could explain many of the 504s the MarkLogic load balancer is returning. Please see this above comment for more details and suggested follow-up.
Script: extractAppServerQueueMetrics.js*
Input: 20240607-1700-1920-app-server-stats.json
Output:
[
{
"appServer":"lux-request-group-1",
"largestQueueSize":36,
"largestQueueSizeCount":115,
"greaterThanOneCount":125,
"dataPointCount":141,
"totalQueuedRequestCount":4327,
"values":"omitted per script configuration"
},
{
"appServer":"lux-request-group-2",
"largestQueueSize":0,
"largestQueueSizeCount":141,
"greaterThanOneCount":0,
"dataPointCount":141,
"totalQueuedRequestCount":0,
"values":"omitted per script configuration"
}
]
* The input includes one instance of a queue size of 37, which exceeds the maximum. I changed the script to use 36 for group one, then added one to that total to account for the single report of 37.
OS-level metrics (via sar
):
These were downloaded from Green's nodes but filenames do not reflect nodes 104, 73, 20.
sar_ip-10-5-156-171.its.yale.edu_2024-06-07T165241.out.gz sar_ip-10-5-156-171.its.yale.edu_2024-06-07T165241_screen.out.gz sar_ip-10-5-157-195.its.yale.edu_2024-06-07T165318.out.gz sar_ip-10-5-157-195.its.yale.edu_2024-06-07T165318_screen.out.gz sar_ip-10-5-254-21.its.yale.edu_2024-06-07T165336.out.gz sar_ip-10-5-254-21.its.yale.edu_2024-06-07T165336_screen.out.gz
Looks like two requests failed due to an attempt to send too much data between nodes:
20240607-green-as-test-node-104-ErrorLog-trimmed.txt:2024-06-07 17:43:31.069 Warning: XDQPServerConnection::init: 172.18.94.14 XDMP-XDQPINVREQ: Invalid XDQP request, size=5388703008, client=0, conn=10.5.156.104:7999-172.18.94.14:50564 20240607-green-as-test-node-73-ErrorLog-trimmed.txt:2024-06-07 18:29:36.054 Warning: XDQPServerConnection::init: 172.18.94.12 XDMP-XDQPINVREQ: Invalid XDQP request, size=5388703008, client=0, conn=10.5.157.73:7999-172.18.94.12:48512
Which line up with these entries:
20240607-green-as-test-node-104-8004-ErrorLog-trimmed.txt:2024-06-07 17:43:36.153 Notice: SVC-SOCPEER: Socket peer name error: getpeername 10.5.156.104:8004: Transport endpoint is not connected 20240607-green-as-test-node-73-8004-ErrorLog-trimmed.txt:2024-06-07 18:29:45.069 Notice: SVC-SOCPEER: Socket peer name error: getpeername 10.5.157.73:8004: Transport endpoint is not connected
Encountered five JS-FATAL
errors. They all match the following pattern, meaning they were all for searchWillMatch
requests. These were found in the 8003 application log files, as opposed to the v8 engine crash log entries we're accustomed to finding in the main error logs.
2024-06-07 18:26:59.300 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
2024-06-07 18:26:59.300 Notice:+in xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) [javascript]
2024-06-07 18:26:59.300 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 18:26:59.300 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 18:26:59.300 Notice: JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07 18:26:59.300 Notice:+in /luxErrorHandler.mjs [javascript]
Nodes and times associated to all instances:
$ grep JS-FATAL 20240607-green-as-test-node-*-800[3-4]-ErrorLog-trimmed.txt
20240607-green-as-test-node-104-8003-ErrorLog-trimmed.txt:2024-06-07 17:30:33.370 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
20240607-green-as-test-node-104-8003-ErrorLog-trimmed.txt:2024-06-07 18:32:31.255 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
20240607-green-as-test-node-104-8003-ErrorLog-trimmed.txt:2024-06-07 18:58:53.361 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
20240607-green-as-test-node-20-8003-ErrorLog-trimmed.txt:2024-06-07 18:52:31.780 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
20240607-green-as-test-node-73-8003-ErrorLog-trimmed.txt:2024-06-07 18:26:59.300 Notice: JS-FATAL: xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) -- Request has fatal error: v8::Object::SetAlignedPointerInInternalField()
cc: @clarkepeterf
While there is a JS-BAD
instance for each above-discussed JS-FATAL
instance, this comment speaks to the other JS-BAD
instances. This subset is related to semantic facet requests. Each instance is provided below. Because they all fail after one minute and the request and app server timeouts are both 59 seconds, I believe these are simply timed-out requests. I also checked 22 Jun 23's logs and found four instances; just the "Failed to calculate" log entries are included below, at the end.
The response status code for all was 500, meaning the data service proxies would not have retried them.
2024-06-07 18:27:16.422 Info: [Event:id=LuxFacets] Searching for the 'responsibleUnits' facet values.
2024-06-07 18:27:16.423 Info: [Event:id=LuxSearch] Search 4295481604482034528 parameters: {"searchCriteria":{"_scope":"agent","OR":[{"memberOfInverse":{"AND":[{"curated":{"containingItem":{"text":"booked","_lang":"en"}}},{"classification":{"id":"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3"}}]}},{"AND":[{"curated":{"containingItem":{"text":"booked","_lang":"en"}}},{"NOT":{"classification":{"id":"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3"}}}]}]},"searchScope":null,"mayChangeScope":false,"page":1,"pageLength":101,"requestContext":"viaSearchFacet","mayExceedMaximumPageLength":true,"mayEstimate":true,"sortDelimitedStr":"","facetsSoon":false,"synonymsEnabled":false}
...
2024-06-07 18:28:15.099 Info: [Event:id=LuxSearch] Search 4295481604482034528 errored out: {"stack":"JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj\nin [anonymous], at 44:5 [javascript]\nin ./utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]\nin ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]\nin ./searchLib.mjs, at 178:38, in _search() [javascript]\nin ./searchLib.mjs, at 81:9, in search() [javascript]\nin ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]\nin ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]\nin ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]\nin /ds/lux/facets.mjs, at 12:9 [javascript]\nin ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]\nin /ds/lux/facets.mjs, at 11:0 [javascript]\nin xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"}) [javascript]\nin /ds/lux/facets.mjs [javascript]\nin /ds/lux/facets.mjs [javascript]","mlerr":{},"data":["checkMaybeLocal","wrapValue valObj"],"message":"Unexpected failure","name":"JS-BAD","retryable":false,"stackFrames":[{"uri":"[anonymous]","line":44,"column":5,"operation":".toArray()","language":"javascript"},{"uri":"./utils/utils.mjs","line":111,"column":14,"operation":"evalInContentDatabase","language":"javascript","isEval":false},{"uri":"./SearchCriteriaProcessor.mjs","line":188,"column":17,"operation":"getSearchResults","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":178,"column":38,"operation":"_search","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":81,"column":9,"operation":"search","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":313,"column":17,"operation":"_getViaSearchFacetValues","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":269,"column":17,"operation":"_getViaSearchFacets","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":52,"column":37,"operation":"getFacets","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":12,"column":9,"language":"javascript","isEval":false},{"uri":"../../lib/requestHandleLib.mjs","line":10,"column":9,"operation":"handleRequest","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":11,"column":0,"operation":"xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"})","language":"javascript","isEval":false}]}
2024-06-07 18:28:15.099 Info: [Event:id=LuxSearch] {"requestId":"4295481604482034528", "requestCompleted":false, "milliseconds":{"total":58676, "process":1}, "estimate":0, "page":1, "pageLength":101, "requestContext":"viaSearchFacet", "returned":0, "scope":"agent", "changedScope":false, "criteria":{"OR":[{"memberOfInverse":{"AND":[{"curated":{"containingItem":{"text":"booked", "_lang":"en"}}}, {"classification":{"id":"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3"}}]}}, {"AND":[{"curated":{"containingItem":{"text":"booked", "_lang":"en"}}}, {"NOT":{"classification":{"id":"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3"}}}]}]}, "ignoredTerms":[], "query":"\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.prefixer(\"http://www.cidoc-crm.org/cidoc-crm/\");\nconst la = op.prefixer(\"https://linked.art/ns/terms/\");\nconst lux = op.prefixer(\"https://lux.collections.yale.edu/ns/\");\nconst skos = op.prefixer(\"http://www.w3.org/2004/02/skos/core#\");\n const docs = fn.subsequence(\n cts.search(cts.andQuery([\n cts.jsonPropertyValueQuery('dataType', [\"Person\", \"Group\"], ['exact']),\n cts.orQuery([cts.documentQuery(cts\n .triples(\n [],\n [crm(\"P107i_is_current_or_former_member_of\")],\n [],\n '=',\n ['eager', 'concurrent'],\n cts.andQuery([cts.documentQuery(cts\n .triples(\n cts\n .triples(\n [],\n [la(\"member_of\")],\n [],\n '=',\n ['eager', 'concurrent'],\n cts.orQuery([\n cts.fieldWordQuery([\"itemAnyText\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1),\n cts.tripleRangeQuery(\n [],\n [lux(\"itemAny\")],\n fn.insertBefore(\n cts.values(\n cts.iriReference(),\n '',\n ['eager', 'concurrent'],\n cts.fieldWordQuery([\"referencePrimaryName\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1)\n ), \n 0,\n sem.iri('/does/not/exist')\n ), '=', [], 1\n )\n ])\n )\n .toArray()\n .map((x) => sem.tripleObject(x))\n .concat(sem.iri('/does/not/exist')),\n [lux(\"agentOfCuration\")],\n [],\n '=',\n ['eager', 'concurrent'],\n undefined\n )\n .toArray()\n .map((x) => sem.tripleObject(x))\n .concat(sem.iri('/does/not/exist'))), cts.fieldValueQuery(\n [\"agentTypeId\"],\n [\"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3\"],\n [\"exact\"],\n 1\n )])\n )\n .toArray()\n .map((x) => sem.tripleObject(x))\n .concat(sem.iri('/does/not/exist'))), cts.andQuery([cts.documentQuery(cts\n .triples(\n cts\n .triples(\n [],\n [la(\"member_of\")],\n [],\n '=',\n ['eager', 'concurrent'],\n cts.orQuery([\n cts.fieldWordQuery([\"itemAnyText\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1),\n cts.tripleRangeQuery(\n [],\n [lux(\"itemAny\")],\n fn.insertBefore(\n cts.values(\n cts.iriReference(),\n '',\n ['eager', 'concurrent'],\n cts.fieldWordQuery([\"referencePrimaryName\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1)\n ), \n 0,\n sem.iri('/does/not/exist')\n ), '=', [], 1\n )\n ])\n )\n .toArray()\n .map((x) => sem.tripleObject(x))\n .concat(sem.iri('/does/not/exist')),\n [lux(\"agentOfCuration\")],\n [],\n '=',\n ['eager', 'concurrent'],\n undefined\n )\n .toArray()\n .map((x) => sem.tripleObject(x))\n .concat(sem.iri('/does/not/exist'))), cts.notQuery(cts.fieldValueQuery(\n [\"agentTypeId\"],\n [\"https://lux.collections.yale.edu/data/concept/5c749df9-4ae0-48ad-b8e3-f120af5689c3\"],\n [\"exact\"],\n 1\n ))])])\n ]), [\"unfaceted\"]), \n 1, \n 101\n ).toArray();\n const results = [];\n for (const doc of docs) {\n results.push({id: doc.baseURI, type: doc.xpath('/json/type')});\n }\n results;"}
2024-06-07 18:28:15.099 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58690 milliseconds: responsibleUnits
2024-06-07 18:28:15.100 Notice: JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj
2024-06-07 18:28:15.100 Notice:+in [anonymous], at 44:5 [javascript]
2024-06-07 18:28:15.100 Notice:+in ./utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]
2024-06-07 18:28:15.100 Notice:+in ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]
2024-06-07 18:28:15.100 Notice:+in ./searchLib.mjs, at 178:38, in _search() [javascript]
2024-06-07 18:28:15.100 Notice:+in ./searchLib.mjs, at 81:9, in search() [javascript]
2024-06-07 18:28:15.100 Notice:+in ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]
2024-06-07 18:28:15.100 Notice:+in ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]
2024-06-07 18:28:15.100 Notice:+in ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]
2024-06-07 18:28:15.100 Notice:+in /ds/lux/facets.mjs, at 12:9 [javascript]
2024-06-07 18:28:15.100 Notice:+in ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]
2024-06-07 18:28:15.100 Notice:+in /ds/lux/facets.mjs, at 11:0 [javascript]
2024-06-07 18:28:15.100 Notice:+in xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) [javascript]
2024-06-07 18:28:15.100 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:28:15.100 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:28:15.102 Warning: {"errorResponse":{"statusCode":500,"status":"Internal Server Error","messageCode":"JS-BAD","message":"Unexpected failure"}}
2024-06-07 18:26:22.219 Info: [Event:id=LuxFacets] Searching for the 'responsibleCollections' facet values.
2024-06-07 18:26:22.219 Info: [Event:id=LuxSearch] Search 14325137610764299934 parameters: {"searchCriteria":{"_scope":"set","AND":[{"containingItem":{"text":"booked","_lang":"en"}},{"classification":{"id":"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d"}}]},"searchScope":null,"mayChangeScope":false,"page":1,"pageLength":101,"requestContext":"viaSearchFacet","mayExceedMaximumPageLength":true,"mayEstimate":true,"sortDelimitedStr":"","facetsSoon":false,"synonymsEnabled":false}
...
2024-06-07 18:27:21.160 Info: [Event:id=LuxSearch] Search 14325137610764299934 errored out: {"stack":"JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj\nin [anonymous], at 35:5 [javascript]\nin ../utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]\nin ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]\nin ./searchLib.mjs, at 178:38, in _search() [javascript]\nin ./searchLib.mjs, at 81:9, in search() [javascript]\nin ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]\nin ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]\nin ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]\nin /ds/lux/facets.mjs, at 12:9 [javascript]\nin ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]\nin /ds/lux/facets.mjs, at 11:0 [javascript]\nin xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"}) [javascript]\nin /ds/lux/facets.mjs [javascript]\nin /ds/lux/facets.mjs [javascript]","mlerr":{},"data":["checkMaybeLocal","wrapValue valObj"],"message":"Unexpected failure","name":"JS-BAD","retryable":false,"stackFrames":[{"uri":"[anonymous]","line":35,"column":5,"operation":".toArray()","language":"javascript"},{"uri":"../utils/utils.mjs","line":111,"column":14,"operation":"evalInContentDatabase","language":"javascript","isEval":false},{"uri":"./SearchCriteriaProcessor.mjs","line":188,"column":17,"operation":"getSearchResults","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":178,"column":38,"operation":"_search","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":81,"column":9,"operation":"search","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":313,"column":17,"operation":"_getViaSearchFacetValues","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":269,"column":17,"operation":"_getViaSearchFacets","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":52,"column":37,"operation":"getFacets","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":12,"column":9,"language":"javascript","isEval":false},{"uri":"../../lib/requestHandleLib.mjs","line":10,"column":9,"operation":"handleRequest","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":11,"column":0,"operation":"xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"})","language":"javascript","isEval":false}]}
2024-06-07 18:27:21.160 Info: [Event:id=LuxSearch] {"requestId":"14325137610764299934", "requestCompleted":false, "milliseconds":{"total":58941, "process":1}, "estimate":0, "page":1, "pageLength":101, "requestContext":"viaSearchFacet", "returned":0, "scope":"set", "changedScope":false, "criteria":{"AND":[{"containingItem":{"text":"booked", "_lang":"en"}}, {"classification":{"id":"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d"}}]}, "ignoredTerms":[], "query":"\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.prefixer(\"http://www.cidoc-crm.org/cidoc-crm/\");\nconst la = op.prefixer(\"https://linked.art/ns/terms/\");\nconst lux = op.prefixer(\"https://lux.collections.yale.edu/ns/\");\nconst skos = op.prefixer(\"http://www.w3.org/2004/02/skos/core#\");\n const docs = fn.subsequence(\n cts.search(cts.andQuery([\n cts.jsonPropertyValueQuery('dataType', [\"Set\"], ['exact']),\n cts.andQuery([cts.documentQuery(cts\n .triples(\n [],\n [la(\"member_of\")],\n [],\n '=',\n ['eager', 'concurrent'],\n cts.orQuery([\n cts.fieldWordQuery([\"itemAnyText\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1),\n cts.tripleRangeQuery(\n [],\n [lux(\"itemAny\")],\n fn.insertBefore(\n cts.values(\n cts.iriReference(),\n '',\n ['eager', 'concurrent'],\n cts.fieldWordQuery([\"referencePrimaryName\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1)\n ), \n 0,\n sem.iri('/does/not/exist')\n ), '=', [], 1\n )\n ])\n )\n .toArray()\n .map((x) => sem.tripleObject(x))\n .concat(sem.iri('/does/not/exist'))), cts.fieldValueQuery(\n [\"setTypeId\"],\n [\"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d\"],\n [\"exact\"],\n 1\n )])\n ]), [\"unfaceted\"]), \n 1, \n 101\n ).toArray();\n const results = [];\n for (const doc of docs) {\n results.push({id: doc.baseURI, type: doc.xpath('/json/type')});\n }\n results;"}
2024-06-07 18:27:21.161 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58944 milliseconds: responsibleCollections
2024-06-07 18:27:21.161 Notice: JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj
2024-06-07 18:27:21.161 Notice:+in [anonymous], at 35:5 [javascript]
2024-06-07 18:27:21.161 Notice:+in ../utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]
2024-06-07 18:27:21.161 Notice:+in ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]
2024-06-07 18:27:21.161 Notice:+in ./searchLib.mjs, at 178:38, in _search() [javascript]
2024-06-07 18:27:21.161 Notice:+in ./searchLib.mjs, at 81:9, in search() [javascript]
2024-06-07 18:27:21.161 Notice:+in ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]
2024-06-07 18:27:21.161 Notice:+in ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]
2024-06-07 18:27:21.161 Notice:+in ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]
2024-06-07 18:27:21.161 Notice:+in /ds/lux/facets.mjs, at 12:9 [javascript]
2024-06-07 18:27:21.161 Notice:+in ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]
2024-06-07 18:27:21.161 Notice:+in /ds/lux/facets.mjs, at 11:0 [javascript]
2024-06-07 18:27:21.161 Notice:+in xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) [javascript]
2024-06-07 18:27:21.161 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:27:21.161 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:27:21.165 Warning: {"errorResponse":{"statusCode":500,"status":"Internal Server Error","messageCode":"JS-BAD","message":"Unexpected failure"}}
2024-06-07 18:27:21.891 Info: [Event:id=LuxFacets] Searching for the 'responsibleCollections' facet values.
2024-06-07 18:27:21.892 Info: [Event:id=LuxSearch] Search 6675450089992221355 parameters: {"searchCriteria":{"_scope":"set","AND":[{"containingItem":{"text":"booked","_lang":"en"}},{"classification":{"id":"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d"}}]},"searchScope":null,"mayChangeScope":false,"page":1,"pageLength":101,"requestContext":"viaSearchFacet","mayExceedMaximumPageLength":true,"mayEstimate":true,"sortDelimitedStr":"","facetsSoon":false,"synonymsEnabled":false}
...
2024-06-07 18:28:21.288 Info: [Event:id=LuxSearch] Search 6675450089992221355 errored out: {"stack":"JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj\nin [anonymous], at 35:5 [javascript]\nin ../utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]\nin ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]\nin ./searchLib.mjs, at 178:38, in _search() [javascript]\nin ./searchLib.mjs, at 81:9, in search() [javascript]\nin ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]\nin ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]\nin ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]\nin /ds/lux/facets.mjs, at 12:9 [javascript]\nin ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]\nin /ds/lux/facets.mjs, at 11:0 [javascript]\nin xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"}) [javascript]\nin /ds/lux/facets.mjs [javascript]\nin /ds/lux/facets.mjs [javascript]","mlerr":{},"data":["checkMaybeLocal","wrapValue valObj"],"message":"Unexpected failure","name":"JS-BAD","retryable":false,"stackFrames":[{"uri":"[anonymous]","line":35,"column":5,"operation":".toArray()","language":"javascript"},{"uri":"../utils/utils.mjs","line":111,"column":14,"operation":"evalInContentDatabase","language":"javascript","isEval":false},{"uri":"./SearchCriteriaProcessor.mjs","line":188,"column":17,"operation":"getSearchResults","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":178,"column":38,"operation":"_search","language":"javascript","isEval":false},{"uri":"./searchLib.mjs","line":81,"column":9,"operation":"search","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":313,"column":17,"operation":"_getViaSearchFacetValues","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":269,"column":17,"operation":"_getViaSearchFacets","language":"javascript","isEval":false},{"uri":"../../lib/facetsLib.mjs","line":52,"column":37,"operation":"getFacets","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":12,"column":9,"language":"javascript","isEval":false},{"uri":"../../lib/requestHandleLib.mjs","line":10,"column":9,"operation":"handleRequest","language":"javascript","isEval":false},{"uri":"/ds/lux/facets.mjs","line":11,"column":0,"operation":"xdmp:eval(\"\\nconst op = require(\\\"/MarkLogic/optic\\\");\\nconst crm = op.pref...\", {}, {database:\"7947491399340755119\", update:\"false\"})","language":"javascript","isEval":false}]}
2024-06-07 18:28:21.288 Info: [Event:id=LuxSearch] {"requestId":"6675450089992221355", "requestCompleted":false, "milliseconds":{"total":59396, "process":1}, "estimate":0, "page":1, "pageLength":101, "requestContext":"viaSearchFacet", "returned":0, "scope":"set", "changedScope":false, "criteria":{"AND":[{"containingItem":{"text":"booked", "_lang":"en"}}, {"classification":{"id":"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d"}}]}, "ignoredTerms":[], "query":"\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.prefixer(\"http://www.cidoc-crm.org/cidoc-crm/\");\nconst la = op.prefixer(\"https://linked.art/ns/terms/\");\nconst lux = op.prefixer(\"https://lux.collections.yale.edu/ns/\");\nconst skos = op.prefixer(\"http://www.w3.org/2004/02/skos/core#\");\n const docs = fn.subsequence(\n cts.search(cts.andQuery([\n cts.jsonPropertyValueQuery('dataType', [\"Set\"], ['exact']),\n cts.andQuery([cts.documentQuery(cts\n .triples(\n [],\n [la(\"member_of\")],\n [],\n '=',\n ['eager', 'concurrent'],\n cts.orQuery([\n cts.fieldWordQuery([\"itemAnyText\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1),\n cts.tripleRangeQuery(\n [],\n [lux(\"itemAny\")],\n fn.insertBefore(\n cts.values(\n cts.iriReference(),\n '',\n ['eager', 'concurrent'],\n cts.fieldWordQuery([\"referencePrimaryName\"], \"booked\", [\"case-insensitive\", \"diacritic-insensitive\", \"punctuation-insensitive\", \"whitespace-insensitive\", \"stemmed\", \"wildcarded\"], 1)\n ), \n 0,\n sem.iri('/does/not/exist')\n ), '=', [], 1\n )\n ])\n )\n .toArray()\n .map((x) => sem.tripleObject(x))\n .concat(sem.iri('/does/not/exist'))), cts.fieldValueQuery(\n [\"setTypeId\"],\n [\"https://lux.collections.yale.edu/data/concept/2340bd83-22b4-4a39-ae46-ba0258e3a79d\"],\n [\"exact\"],\n 1\n )])\n ]), [\"unfaceted\"]), \n 1, \n 101\n ).toArray();\n const results = [];\n for (const doc of docs) {\n results.push({id: doc.baseURI, type: doc.xpath('/json/type')});\n }\n results;"}
2024-06-07 18:28:21.289 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 59400 milliseconds: responsibleCollections
2024-06-07 18:28:21.289 Notice: JS-BAD: .toArray() -- Unexpected failure: checkMaybeLocal wrapValue valObj
2024-06-07 18:28:21.289 Notice:+in [anonymous], at 35:5 [javascript]
2024-06-07 18:28:21.289 Notice:+in ../utils/utils.mjs, at 111:14, in evalInContentDatabase() [javascript]
2024-06-07 18:28:21.289 Notice:+in ./SearchCriteriaProcessor.mjs, at 188:17, in getSearchResults() [javascript]
2024-06-07 18:28:21.289 Notice:+in ./searchLib.mjs, at 178:38, in _search() [javascript]
2024-06-07 18:28:21.289 Notice:+in ./searchLib.mjs, at 81:9, in search() [javascript]
2024-06-07 18:28:21.289 Notice:+in ../../lib/facetsLib.mjs, at 313:17, in _getViaSearchFacetValues() [javascript]
2024-06-07 18:28:21.289 Notice:+in ../../lib/facetsLib.mjs, at 269:17, in _getViaSearchFacets() [javascript]
2024-06-07 18:28:21.289 Notice:+in ../../lib/facetsLib.mjs, at 52:37, in getFacets() [javascript]
2024-06-07 18:28:21.289 Notice:+in /ds/lux/facets.mjs, at 12:9 [javascript]
2024-06-07 18:28:21.289 Notice:+in ../../lib/requestHandleLib.mjs, at 10:9, in handleRequest() [javascript]
2024-06-07 18:28:21.289 Notice:+in /ds/lux/facets.mjs, at 11:0 [javascript]
2024-06-07 18:28:21.289 Notice:+in xdmp:eval("\nconst op = require(\"/MarkLogic/optic\");\nconst crm = op.pref...", {}, {database:"7947491399340755119", update:"false"}) [javascript]
2024-06-07 18:28:21.289 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:28:21.289 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:28:21.292 Warning: {"errorResponse":{"statusCode":500,"status":"Internal Server Error","messageCode":"JS-BAD","message":"Unexpected failure"}}
From 22 Jun 23:
20230622-dev-node-154-8003-ErrorLog-trimmed.txt:2023-06-22 16:39:56.106 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58206 milliseconds: responsibleUnits
20230622-dev-node-237-8003-ErrorLog-trimmed.txt:2023-06-22 16:39:57.087 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58221 milliseconds: responsibleCollections
20230622-dev-node-5-8003-ErrorLog-trimmed.txt:2023-06-22 17:17:21.023 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58536 milliseconds: responsibleCollections
20230622-dev-node-5-8003-ErrorLog-trimmed.txt:2023-06-22 17:17:21.290 Info: [Event:id=LuxFacets] Failed to calculate the following facets after 58796 milliseconds: responsibleUnits
cc: @clarkepeterf
Out of the remaining entries non-trace event entries found in the 8003 and 8004 application error logs, there are three JS-JAVASCRIPT
instances. They all report Error running JavaScript request: null
and --per the stack trace-- fail in either the searchWillMatch
of facets
data service. These data services have 9 - 14 lines of code. Not sure what would be null
that isn't already protected. Perhaps they are failing within their respective library code? Perhaps ML failed to load the code? The associated access and request log entries do not provide additional insights. A bit of a mystery but also only three requests.
2024-06-07 17:27:14.115 Notice: JS-JAVASCRIPT: Error running JavaScript request: null
2024-06-07 17:27:14.115 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 17:27:14.115 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 17:27:14.123 Warning: {"errorResponse":{"statusCode":400,"status":"Bad Request","messageCode":"BadRequestError","message":"null"}}
2024-06-07 18:14:29.560 Notice: JS-JAVASCRIPT: Error running JavaScript request: null
2024-06-07 18:14:29.560 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 18:14:29.560 Notice:+in /ds/lux/searchWillMatch.mjs [javascript]
2024-06-07 18:14:29.563 Warning: {"errorResponse":{"statusCode":400,"status":"Bad Request","messageCode":"BadRequestError","message":"null"}}
2024-06-07 18:29:05.201 Notice: JS-JAVASCRIPT: Error running JavaScript request: null
2024-06-07 18:29:05.201 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:29:05.201 Notice:+in /ds/lux/facets.mjs [javascript]
2024-06-07 18:29:05.205 Warning: {"errorResponse":{"statusCode":400,"status":"Bad Request","messageCode":"BadRequestError","message":"null"}}
Request response times:
Endpoint | < 10 millis | < 100 millis | < 1 second | < 3 seconds | 3-10 seconds | > 10 seconds | Total |
---|---|---|---|---|---|---|---|
Search | 2.25% | 32.81% | 80.93% | 100.00% | 0.00% | 0.00% | 2,758 |
Facets | 5.56% | 26.83% | 88.57% | 95.54% | 2.79% | 1.66% | 32,348 |
Related Lists | 0.00% | 0.00% | 47.31% | 92.65% | 5.49% | 1.87% | 911 |
Search Will Match | 0.71% | 15.25% | 60.79% | 82.57% | 2.18% | 15.25% | 22,760 |
Search Estimate | 5.30% | 35.37% | 98.84% | 100.00% | 0.00% | 0.00% | 15,385 |
Document | 97.89% | 99.98% | 100.00% | 100.00% | 0.00% | 0.00% | 169,433 |
Other endpoints | 51,416 | ||||||
Total Requests | 295,011 |
Mined log output:
JS-FATAL
instances within the application error logs, which may be a different way v8 engine crashes are reported. All were associated to searchWillMatch
requests. We may want to log more information about searchWillMatch
requests and/or configure a system monitoring test in Blue/Green. Follow-up tickets: https://github.com/project-lux/lux-marklogic/issues/177 and https://git.yale.edu/lux-its/ml-cluster-formation/issues/19There are some peculiar errors that may be worth noting. The root cause must be some error from in the frontend, but @brent-hartwig - would you say this error "Document is not JSON" was returned by MarkLogic?
failed search estimate for [object Object] - Document is not JSON
There were 1,366 of them.
Also interesting is that there were only(?) five JS-BAD errors and all of them were with the same data:
2024-06-07T18:32:31.256Z error: [121] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07T17:30:33.371Z error: [64] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07T18:52:31.782Z error: [80] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07T18:26:59.301Z error: [81] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
2024-06-07T18:58:53.379Z error: [65] failed to get doc for /data/set/1cbd9fc8-809a-4d36-9687-73f841c701e2 - JS-BAD: Unexpected failure: checkMaybeLocal wrapObjectNode objnObj
And there were 22 errors about retries for 504, e.g.,
failed to get doc for /data/place/067073a4-c71a-4238-8531-4de99805941c?profile=name - call to /ds/lux/document.mjs: retry failed for 504 status after 31 attempts over 121.204 seconds with path:
As previously discussed, this is the behavior of the marklogic client library that we cannot change, but 31 attempts over 121 seconds seems just too much, doesn't quite make sense to me.
@gigamorph, thanks for pointing out the 'document is not JSON' errors. I encountered them and thought I chased down the issue but must have forgotten about it. Here's a complete set of entries per instance:
2024-06-07 17:35:41.880 Notice: XDMP-JSONDOC: Document is not JSON
2024-06-07 17:35:41.880 Notice:+in /ds/lux/searchEstimate.mjs [1.0-ml]
2024-06-07 17:35:41.884 Warning: {"errorResponse":{"statusCode":400,"status":"Bad Request","messageCode":"XDMP-JSONDOC","message":"Document is not JSON"}}
Based the grep results/count of the middle line, all instances occurred within searchEstimate.mjs, leading me to believe something in the performance test is compelling the middle tier to send invalid search criteria. Undoubtedly there are other possibilities but I don't know that the backend logs will help with that determination.
@gigamorph, the five JS-BAD
errors that reached the middle tier are a subset of all JS-BAD
errors reported in the backend. These particular five are failed searchWillMatch
requests. I'm yet to determine why they failed, and here too, the backend logs may not provide additional insights. Please see this comment for the original write-up.
@gigamorph,
As previously discussed, this is the behavior of the marklogic client library that we cannot change, but 31 attempts over 121 seconds seems just too much, doesn't quite make sense to me.
Agreed. Since the backend logs document https://lux.collections.yale.edu/data/place/067073a4-c71a-4238-8531-4de99805941c
was successfully found 119 times, I suspect some to all of the "failed to get doc" errors (all URIs) recorded by the middle tier were caused by a MarkLogic application server request queue being full for the majority of the test. The original comment associating 504s to a full app server queue may be found here. And the associated follow-up tickets are https://github.com/project-lux/lux-middletier/issues/67 and https://github.com/project-lux/lux-marklogic/issues/180.
That aside, we have also discussed submitting a support ticket / RFE requesting more control over data service retry behavior. I'm not opposed to doing so but wouldn't use this example. Hopefully the middle tier receives 504s (or 503s) when an app server request queue is full. Such requests should be retried.
One of 119 successfully requests for the above document:
2024-06-07 17:41:43.220 Info: [Event:id=LuxNamedProfiles] Applied the 'name' profile to 'https://lux.collections.yale.edu/data/place/067073a4-c71a-4238-8531-4de99805941c' in 1 milliseconds.
Related to there being fewer than expected backend requests...
During a performance test, how many frontend requests should there be for every backend request? We have never attempted to establish this metric. We would either have to account for requests the frontend makes to other sources (such as for images and analytics) or include them.
When included, this test has 105% to 202% more frontend requests compared to the 14 Feb 24 and 22 Jun 23 tests, respectively. Since SI was introduced in Mar 2024, we can't really compare to tests older than that.
Perhaps we can exclude those other requests. We asked QA if their test logs can tell us how many middle tier requests there were, ideally by status code. (Middle tier requests have paths beginning with "/api/" or "/data/".)
@gigamorph was able to determine there were 363,788 middle tier requests during the test via:
SELECT COUNT(*) FROM lux_cloudfront_tst
WHERE date = DATE '2024-06-07' AND time > '17:02:00' AND time < '19:16:30'
AND (cs_uri_stem LIKE '/api/%' OR cs_uri_stem LIKE '/data/%');
Given most if not all middle tier requests result in multiple backend requests, neither of us would expect more middle tier requests than backend requests, yet that's what the counts look like as there were only 307,511 backend requests. We do not yet know why.
Middle tier request counts by status code:
Status code | Requests | Percent | Requests | Percent |
---|---|---|---|---|
0 | 66,949 | 18.40% | ||
200 | 178,199 | 48.98% | ||
304 | 9,700 | 2.67% | ||
Sub-total: through 300s | 254,848 | 70.05% | ||
400 | 1,218 | 0.33% | ||
404 | 755 | 0.21% | ||
500 | 12 | 0.00% | ||
502 | 485 | 0.13% | ||
503 | 106,465 | 29.27% | ||
504 | 5 | 0.00% | ||
Sub-total: 400s and 500s | 108,940 | 29.95% | ||
Totals | 363,788 | 363,788 |
Questions:
Sampling of requests with 503 response status codes:
@brent-hartwig SI was implemented in March. For almost every clickable action, there is an SI event being pushed. I'm not sure if that helps though.
Thanks, @kamerynB. It does. I updated my comment accordingly.
Why was 304 (not modified) returned for nearly 10K requests when caching should be disabled?
Seong researched and provided the following explanation. I agree we do not need to pursue this. In this test, they only accounted for 3% of the middle tier responses.
The node express library that middle tier uses as the web middleware, automatically generates and attaches an etag to each response. ETag is basically hash of the response data that enables the client to determine if the content has changed since the last request. Any well-established client including the web browsers will include an "If-None-Match: ${etag}" header in its requests telling the server to only send the full response (w/ 200) if the etag for the new response is not the previous one that the client(browser) cache currently has. Otherwise the server will just send back the 304 code with an empty body to save the traffic load.
So, that's why you see a lot of 304s in the browser network console. And the thing is, CloudFront and the web cache (Varnish) will each likely do the same thing, so if we need to (I doubt we need), it is not easy to disable all this 304 responses.
Decisions on executive summary items from conversation with @jffcamp and @prowns today:
searchWillMatch
instances will be helped by #177. There was also a facets
instance, that would not be.Closing this ticket as all findings have been "acknowledged, understood, issue submitted, or resolved". For the most part, links were added to #181 comments that continue the associated investigation or links to new tickets were added.
Primary objective: Test the system stability and performance impact of #160. Given previous analysis, we believe #160 will:
If available, #161 can be part of this performance test as well. Given the aforementioned analysis was of a search for works, semantic facets were not involved; as such, it is welcome but not required by this test.
Code and Configuration Changes: Required: implementation of #160 and associated frontend and middle tier implementations. Optional: implementation of #161; if included, its associated frontend and middle tier implementations are required.
Environment and versions (update as needed):
Scenario AK of the Perf Test Line Up: "See if paginating facet requests (#160 and #161) restores system stability by reducing --if not eliminating-- v8 engine crashes."
Backend application server configuration:
For more information please see the documentation: LUX Performance Testing Procedure
Tasks:
v8 delay timeout
.Collect Data (Details from procedure):
Restore and Verify Environment:
Analyze:
Notes: