project-lux / lux-marklogic

Code, issues, and resources related to LUX MarkLogic
Other
3 stars 2 forks source link

Review environment needs of latest dataset and index configuration (from 727) #22

Open gigamorph opened 4 months ago

gigamorph commented 4 months ago

Multiple v1.0.9 tickets are adding fields and field range indexes, and the dataset is changing. Both of those are grounds to perform the following.

  1. Capture and document metrics pegged to dataset and backend versions. Metrics of interest include:
    • Size of the lux-content database on disk (all forests). Compare to admin UI. Go with the larger of the two.
    • Expansion ratio: size of documents sent into MarkLogic to the lux-content database size.
    • Number of documents and fragments.
    • Memory utilization/footprint.
    • I added some scripts collected over the years within sizingAndHealthCheck. They should be reviewed before using, and may require updates, but do still have value. forest-report.xqy are forest-based-sizing-tool.xlsx are the most recent additions, and were created by me. The former's output is input to the latter. It's a bit dense but could be worth becoming familiar with over time.
  2. Capture and document the ingestion rate along with the number of concurrent threads, batch size, and whether the pipeline is loading it (from memory) versus MLCP (from disk).
    • Average batch duration towards the beginning of the load.
    • Average batch duration towards the end of the load.
    • Total duration.
  3. Determine if group-level cache settings should be adjusted.
    • See and update the "Group-Level Caches" tab in lux-backend-sizing.xlsx.
    • Versions of this document include the memory footprint of the largest forest, pegged to a Backend version. I didn't capture a dataset version/description. We should probably document this and other metrics in lux-backend-sizing.md
      • Jul 2022, Backend v1.0.4: largest forest was consuming 2,500 MB of memory (rounded up from ?)
      • Jun 2022, Backend v1.0.3: 1,745 MB, rounded up to 2,000 MB
  4. Ensure the Forest Reserve Requirement is met. The current database size and topology info can be entered on the "Forest Reserve" tab of lux-backend-sizing.xlsx to help with this calculation.
brent-hartwig commented 2 months ago

@xinjianguo, this ticket is relevant to adjusting an environment's storage amount.

cc: @prowns, @clarkepeterf, @jffcamp