project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Performance test: include alternative names when resolving name search criteria #132

Open jffcamp opened 1 month ago

jffcamp commented 1 month ago

Primary objective: Determine if switching from the smaller primary name fields to the larger fields containing both primary and alternative names still yields acceptable performance. See below for specific changes being tested. The associated development ticket is #100.

Code and Configuration Changes:

  1. The related documents portion of the keyword search pattern switched from the referencePrimaryName field to the referenceName field. By search scope, here are the record types that included or excluded from reference fields:
    • Included:
      • Agent: Group, Person
      • Concept: Language, MeasurementUnit, Type
      • Event: Activity, Period
      • Place: Place
    • Excluded:
      • Concept: Currency, Material
      • Item: DigitalObject, HumanMadeObject
      • Work: LinguisticObject, Set, VisualItem
  2. All name search terms less the set search scope switched from their search scope-specific primary name field to their broader name field; for instance, the name search term in the agent scope was changed from agentPrimaryName to agentName.

Environment and versions: Green (as TST) comprised of MarkLogic 11.0.3, Backend v1.15.0, Middle Tier v1.1.18, Frontend v1.25.2, and Dataset produced on 2024-04-18.

Scenario AH of the Perf Test Line Up: our existing dual app server configuration (Scenario J) but with the above-discussed field difference. The last time Scenario J was tested is documented within https://git.yale.edu/lux-its/marklogic/issues/1033 (internal link).

Key metrics we're targeting (column E / scenario J):

image

Number of application servers: 2 per node. Maximum number of concurrent application server threads:

For more information please see the documentation: LUX Performance Testing Procedure

Tasks to complete:

Data collection (Details from procedure):

Revert all configuration changes:

Verify:

Analysis:

brent-hartwig commented 1 month ago

@jffcamp, https://github.com/project-lux/lux-marklogic/issues/34 includes requests from Engineering of a future performance test. I am interested in testing without the custom error handler but there could be middle tier implications that we should first discuss with @gigamorph (less 408s and more 500s for timed out requests). Probably need to pass for Thursday's test. This ticket's directions does have us enable the requested v8 trace event --which I think we can leave permanently enabled in Blue and Green.

brent-hartwig commented 1 month ago

@jffcamp, a reminder from our last performance test (internal link):

Despite QA revising the LoadRunner scripts to be more inline with Scenario I's June 2023 test and only four additional v8 engine crashes, today's test was comprised of significantly fewer requests. While this does not invalidate this particular test (given we just need to get crash info to MarkLogic Support ticket no. 35746), we may want to be aware of it for future comparisons to older tests.

Note too that there would have been a larger ratio of search estimate requests compared to June 2023. Back in June, the middle tier would request multiple estimates in a single backend request. That since changed to a 1:1 ratio between estimate and backend endpoint requests. The same was not yet true for Search Will Match requests.

image

xinjianguo commented 1 month ago

OS metrics

cd; cd Apps/LUX/ML $ ssh -i ch-lux-ssh-prod.pem ec2-user@10.5.156.104 $ ssh -i ch-lux-ssh-prod.pem ec2-user@10.5.157.73 $ ssh -i ch-lux-ssh-prod.pem ec2-user@10.5.254.20

nohup sudo sar -u -r -o /tmp/sar_${HOSTNAME}_$(date +"%Y-%m-%dT%H%M%S").out 10 >/tmp/sar_${HOSTNAME}_$(date +"%Y-%m-%dT%H%M%S")_screen.out 2>&1 &

Screen Shot 2024-05-09 at 11 09 01 AM

received "MarkLogic green Hung or v8 crash" alert at 11:08:49

jffcamp commented 1 month ago

Many crashes. Reverted the change and restarted the test.

jffcamp commented 1 month ago

Test in TST w/change reverted also had errors. Performed performance test in DEV. w/out changes. We got crashes early. Also noticed a crash at around 10:15 AM, before any tests being performed.

jffcamp commented 1 month ago

Xinjian to open a ticket with ML support to investigate DEV crashes.

brent-hartwig commented 1 month ago

Summary of tests performed as part of this ticket, on 9 May 24...

First Test

The scenario we set out to initially test: TST environment with Green backend resolving name search criteria against primary and alternative names.

The first v8 engine crash occurred after five minutes and more ensued. Given the v8 engine did not crash when Scenario J was tested (and retested) in June 2023, we were concerned the addition of alternative names were causing the issue. The test was aborted after 20 to 30 minutes.

Second Test

Elected to switch back to primary names but keep the rest the same: TST environment with Green backend resolving name search criteria against primary names alone.

The first v8 engine crash was also early on. The crashes were slightly less frequent but much more frequent than other scenarios tested. This test was also aborted after 20 to 30 minutes.

We suspected the mixed MarkLogic version environment may be a larger factor than anticipated. Back in Feb 2024, Green and Blue were upgraded to a ML 11.2 nightly build for some testing before bring ML 11.0.3 back. Due to not starting with a new data directory, there were 11.2 remnants, including an upgraded Security database and journal files ML 11.0.3 detected and ignored.

Third Test

Elected to move over to DEV, which is considered to be a clean install of ML 11.2.0 GA. This was this project's first performance test against ML 11.2.0. We deployed vanilla Backend v1.15.0, meaning alternative names were not in play. Basically, test 3 was the same as test 2 but in DEV / ML 11.2.0.

We didn't witness a better outcome. Here too, v8 engine crashes occurred early on and persisted. v8 engine crashes prior to the performance test were also observed, in DEV.

We opened support ticket no. 36846 requesting Support help diagnose the two sets of v8 engine crashes.