isomerpages / isomercms-backend

A static website builder and host for the Singapore Government
5 stars 1 forks source link

IS-548: Add max concurrent git processes #936

Closed harishv7 closed 1 year ago

harishv7 commented 1 year ago

Problem

We are rolling out GGS to more sites and we need to ensure that GGS scales with increase in number of concurrent sites.

Closes IS-548

Solution

With relation to https://github.com/isomerpages/isomer-tooling/pull/51, we found out the following:

Having a shared simple-git instance for all repos works fine without errors, provided that we increase the MAX_CONCURRENT_PROCESSES to be equal the the number of repos we want to support concurrently.

Assuming each Git process has an upper bound of 10MB of memory usage, and given a RAM size of 2GB, we can support a max of 200 processes.

Allowing for some buffer, we will make this number 150 as a upper bound. Note that this is still way higher than our realistic expected usage.

We observed a total of 3500 calls / hour which works to be roughly 60 per minute. Hence, 150 concurrent works out to be a sufficiently high enough buffer.

Breaking Changes

Tests

Ensure all unit tests pass

harishv7 commented 1 year ago

changes itself look fine to me - what i want clarity on is how we got the following:

Assuming each Git process has an upper bound of 10MB of memory usage, and given a RAM size of 2GB

because these are important #s that will affect how we choose the eventual MAX_CONCURRENT_GIT_PROCESSES. do also take note that the max git proc will also depend on your system's operating capacity - are we ever gonna change it? (also, what is it atm)

@seaerchin

Regarding 10MB, we saw what's the max mem taken by git processes on mac activity monitor - max I saw was abt 4MB. But we rounded up for upper bound to assume each git process can be 10MB.

t2.small instance = 2GB RAM To scale, we up the instance capacity, but unlikely required for the foreseeable future. here concurrent means at the exact same time, and based on our metrics it is around 60 req / min

To be cautious, we still should observe our memory usage on the instance for a while. I believe DD has this? And AWS also provides some monitoring I believe