Multiple v1.0.9 tickets are adding fields and field range indexes, and the dataset is changing. Both of those are grounds to perform the following.
Capture and document metrics pegged to dataset and backend versions. Metrics of interest include:
Size of the lux-content database on disk (all forests). Compare to admin UI. Go with the larger of the two.
Expansion ratio: size of documents sent into MarkLogic to the lux-content database size.
Number of documents and fragments.
Memory utilization/footprint.
I added some scripts collected over the years within sizingAndHealthCheck. They should be reviewed before using, and may require updates, but do still have value. forest-report.xqy are forest-based-sizing-tool.xlsx are the most recent additions, and were created by me. The former's output is input to the latter. It's a bit dense but could be worth becoming familiar with over time.
Capture and document the ingestion rate along with the number of concurrent threads, batch size, and whether the pipeline is loading it (from memory) versus MLCP (from disk).
Average batch duration towards the beginning of the load.
Average batch duration towards the end of the load.
Total duration.
Determine if group-level cache settings should be adjusted.
Versions of this document include the memory footprint of the largest forest, pegged to a Backend version. I didn't capture a dataset version/description. We should probably document this and other metrics in lux-backend-sizing.md
Jul 2022, Backend v1.0.4: largest forest was consuming 2,500 MB of memory (rounded up from ?)
Jun 2022, Backend v1.0.3: 1,745 MB, rounded up to 2,000 MB
Multiple v1.0.9 tickets are adding fields and field range indexes, and the dataset is changing. Both of those are grounds to perform the following.
forest-report.xqy
areforest-based-sizing-tool.xlsx
are the most recent additions, and were created by me. The former's output is input to the latter. It's a bit dense but could be worth becoming familiar with over time.