Closed jffcamp closed 1 week ago
One of Brent's grep commands grep -v Info ErrorLog.txt | grep -v Debug
Middle tier stats:20240521-tst-middle-tier-stats.txt
Start Time: 12:57:16 pm EST End Time: 3:06:58 pm EST
on all 3 ML nodes collect OS metrics
cd; cd Apps/LUX/ML $ ssh -i ch-lux-ssh-prod.pem ec2-user@10.5.156.217
nohup sudo sar -u -r -o /tmp/sar_${HOSTNAME}_$(date +"%Y-%m-%dT%H%M%S").out 10 >/tmp/sar_${HOSTNAME}_$(date +"%Y-%m-%dT%H%M%S")_screen.out 2>&1 &
$ ssh -i ch-lux-ssh-prod.pem ec2-user@10.5.157.111 $ ssh -i ch-lux-ssh-prod.pem ec2-user@10.5.254.22
cd /tmp
ls -l sar*05-21*
sudo gzip sar*05-21*
on local desktop:
cd ~/Apps/LUX/marklogic/scripts/logAnalysis
mkdir ~/Apps/LUX/ML/test/20240521
vi collectBackendLogs.sh
./collectBackendLogs.sh
OS metrics
sar_ip-10-5-156-62.its.yale.edu_2024-05-21T165212_screen.out.gz sar_ip-10-5-156-62.its.yale.edu_2024-05-21T165212.out.gz sar_ip-10-5-157-203.its.yale.edu_2024-05-21T165215_screen.out.gz sar_ip-10-5-157-203.its.yale.edu_2024-05-21T165215.out.gz sar_ip-10-5-254-44.its.yale.edu_2024-05-21T165217_screen.out.gz sar_ip-10-5-254-44.its.yale.edu_2024-05-21T165217.out.gz
ML CPU details
ML Memory details
OS CPU
ALB
Unknown why the request count wasn't closer amongst the nodes. Note that the node with the fewest requests also consistently reported higher CPU utilization.
Node | Request Count | % of |
---|---|---|
22 | 56,590 | 90% |
111 | 59,499 | 95% |
217 | 62,721 | 100% |
@xinjianguo, the AWS CPU utilization charting is more granular than MarkLogic's. Can you map the ec2 labels from that chart to nodes 22, 111, and 217? I'd like to know if AWS also reflects node 22's CPU was utilized more than the other two. Thank you.
@xinjianguo, I don't yet see the monitoring history exports and am attaching them now. I have just come to realize that the export links on the detailed views server up different information. As such, I'm attaching what we have always exported, overview-20240521-204411.xls, plus the detailed exports:
@brent-hartwig oh I thought we only need the graphs, will capture exports as well
Results invalidated. QA lost 10 flows due to a UI change. This caused a single flow failure and prevented all following flows from running.
Approved by UAT
Closing as this ticket was marked as Done the week of 6/3.
Primary objective: We are running this performance test to replicate the good performance test results run last year (Scenario J).
Differences since last test: Three performance tests were executed under #132. We were unable to reproduce Scenario J's outcome. This test is expected to be the equivalent of test no. 2 but in Blue and after restarting the ec2 instances. Blue also has ML 11.0.3 with remnants of a nightly build of ML 11.2.0.
Environment and versions: Blue (as TST) comprised of MarkLogic 11.0.3 (downgraded from 11.2 early release), Backend v1.16.0, Middle Tier v1.1.9, Frontend v1.26, and Dataset produced on 2024-04-18.
Scenario AI of the Perf Test Line Up: our existing dual app server configuration (Scenario J) but after replacing the ec2 instances. ML 11.0.3 environment with remnants of a nightly build of ML 11.2.0.
Key metrics we're targeting (column E / scenario J):
Number of application servers: 2 per node. Maximum number of concurrent application server threads:
For more information please see the documentation: LUX Performance Testing Procedure
Tasks to complete:
[ ] Deploy Backend v1.16.0 with thefullTextSearchRelatedFieldName
build property set toreferenceName
.[ ] In QC, verify /lib/appConstants.mjs includesconst FULL_TEXT_SEARCH_RELATED_FIELD_NAME = 'referenceName'.trim();
v8 delay timeout
.Data collection (Details from procedure):
Revert all configuration changes:
[ ] Deploy Backend v1.16.0 with thefullTextSearchRelatedFieldName
build property set toreferencePrimaryName
.Verify:
Analysis: