mrc-ide / covid-sim

This is the COVID-19 CovidSim microsimulation model developed by the MRC Centre for Global Infectious Disease Analysis hosted at Imperial College, London.
GNU General Public License v3.0
1.23k stars 256 forks source link

Regression Test Heap Optimization: Remove Extinct Result Arrays #433

Open ozmorph opened 4 years ago

ozmorph commented 4 years ago

Objective

Reduce RAM usage in regression tests so that GitHub Action runners can run all 4 tests without paging memory.

Background

The size of the Results struct in Model.h is nearly 805 KB on Clang 9 for Linux, which when accounting for an array of P.NumSamples (720 in the regression tests) means that TSMeanE and TSVarE each require an allocation of around 580 MB. As a result, a single regression test will use over 1.1 GB of RAM just for these two arrays alone. When all 4 tests are ran in parallel, they consume 4.4 GB of RAM.

Rationale

As noted in #388, the default GitHub action runners have only 7 GB of total RAM and a 14 GB temporary storage device allocated to them. For Linux environments the swap space is set to 4GB and on Windows the swap space is most likely set to 2GB.

Local runs using heaptrack indicate that about 3.5GB is the average heap size of each run of CovidSim. If accounting for the fact that the US regression tests run for a fraction of the time that the UK tests do, let's assume that on average 7 GB of RAM is being used by the regression tests and a maximum of 14 GB is used. Windows Server 2019 requires 512MB of RAM to run. If 7 GB is the total amount of RAM given to the runners, this does not leave much room to the operating system itself. Therefore, I surmise that CovidSim is hitting paged memory for at least the first half of a ctest -j6 -V run.

Solution

This pull request removes the TSMeanE and TSVarE arrays from the code since they haven't been used in the final output since the initial squash of commits back in April. The regression tests still pass.

Known Concerns

If this code is still used on the super-computers or outside of the regression tests listed on GitHub, then please close this PR and I will open a new PR that documents its use so that someone doesn't attempt to propose this change in the future.

Alternative/Additional Solution

The memory requirements of the Result arrays could also be optimized either with heap-allocated C style arrays or std::vectors instead of static double arrays of MAX_ADUNITS size (which is set to 3200 but the regression tests only use 1 or 2 elements). However this would require far deeper and more extensive changes to the code base than this Pull Request.

ozmorph commented 4 years ago

Sounds good! I like having a check as an alternative.

I am surprised too that it did not affect run-time. It could also be that the paging algorithm and Windows allocators are good enough to hide this cost. I'm going to try to replicate the CI binaries on my Windows system and analyze the performance versus my Linux runs to try to determine where the real difference lies.

Good to note about running only 2 tests at once on the CI.

ozmorph commented 4 years ago

@matt-gretton-dann Using the exact same compiler that GitHub actions is using, I get similar performance on my local machine as Clang 9 on WSL Ubuntu 20.04 does. This led me down a rabbit hole, but I think your past theory about all of the fprintf() calls being responsible for the slowdown are probably correct.

It appears that I/O performance is a common issue for GitHub actions (especially on Windows):

It appears this project is not the only one to be having this issue. It also explains why Windows performance is so drastically different than Linux and Mac.

To conclude, reducing memory usage anywhere is not going to make the regression tests faster by itself but I think it would provide the extra headroom necessary for more variable tracking in additional simulations in the future.

matt-gretton-dann commented 4 years ago

This led me down a rabbit hole, but I think your past theory about all of the fprintf() calls being responsible for the slowdown are probably correct.

I have a WiP branch which enables us to control output verbosity - and so that should show whether I am right or not.