medic / cht-docs

Documentation site for the Community Health Tookit
https://docs.communityhealthtoolkit.org
13 stars 18 forks source link

Clarify performance, capacity, technical capabilities to technical partners #83

Open kennsippell opened 4 years ago

kennsippell commented 4 years ago

Technical Partners need to roughly calibrate their expectations for the scalability/capacity of the cht-core.

We should attempt to communicate:

  1. Recommendations for server hardware based on project size
  2. How many users they can support on the hardware. How many documents per user
  3. How many users can replicate at once during heavy data migrations. Guidelines for scenarios where many users need to do initial replications.
  4. When should partners consider splitting an instance into multiple instances
  5. How long it should take users to initial replicate in ideal conditions (1000 docs should take x minutes/seconds)
  6. General system limits
  7. When designing a system how many documents is "healthy" for a user to have on their phone. When do users start getting warnings during replication?

Example of a TP having unclear understanding of the system capacity/performance expectations. https://github.com/medic/cht-core/issues/6246

antonykhaemba commented 4 years ago

Hi @MaxDiz

From our previous discussions with LG team, Nii (LG Director of Software Engineering) and LG Dev team had requested our team to share with them an estimate of the number of users our API can support and other details that @kennsippell highlighted on this issue. When might we be able to share the information, so that LG can plan into the future. LG is planning to scale up to more branches and this information is needed to help them make an informed decision.

cc @Enock1990 and @SCdF

MaxDiz commented 4 years ago

Good questions for our fearless Dir of Technology @garethbowen

garethbowen commented 4 years ago

This will require significant effort in terms of testing and should be scheduled as part of the upcoming backlog prioritisation session.

kennsippell commented 4 years ago

Starting to collect some anecdotal datapoints for "Recommendations for server hardware based on project size"

instance with 18M docs on 3.6.1 runs on c5.4xlarge needs to upgrade to c5.9xlarge to avoid request timeouts during periods of high intiial replication (100s of users replicating)

Some data for an SMS-only project on 3.7.1:

cdc-mohke-dsru.app sits on 16core, 64gb RAM multi-homed server with 11 other projects. The container is seen only bursting to 1.75gb of RAM usage across the last 7 days, and 0.3% of CPU usage.

kennsippell commented 4 years ago

Here is a hardware question from MSF-Niger to illustrate this problem:

worst case may be around 5000 docs here is the break down : 5 samu users and 500 docs each per day : 2500 5 riposte users and 220 docs each per day : 1100 20 investigator users and 100 docs each per day : 2000

What hardware do we need?

And from MSF Goma:

What we are looking for: 10 app users 6k texts. Does that equal 6k docs Kenn? Is this gateway or AfricasTalking? If its latter, I would assume we would pass through and perhaps we could compare against a webapp project with as many daily doc creation. I don’t think Nepal is sending 6k texts a day (in one district)? They want some benchmarks and not just new hardware numbers.

What do we tell them?

MaxDiz commented 4 years ago

transferring to new cht-docs repo

kennsippell commented 3 years ago

A great doc from @nomulex with a model for hardware needs. https://docs.google.com/document/d/1Vr9ei9iusRfFl0V4wJK3bjDnaqQCXQbGkYPkbLdMVqo/edit

kennsippell commented 1 year ago

Proposal to establish clearer performance/reliability expectations via Service Level Objectives https://docs.google.com/document/d/11BWXPku5siAZpkI3YJnDPkgqInEWhK2XtkccfiVDxzE/edit#