openghg / openghg_inversions

University of Bristol Atmospheric Chemistry Research Group RHIME Inversion code (with openghg dependency)
MIT License
5 stars 0 forks source link

Speed improvements for HBMCMC - extra info #17

Open aliceramsden opened 1 year ago

aliceramsden commented 1 year ago

The table below contains info on the runtimes for various HBMCMC setups, from the code at commit https://github.com/openghg/openghg_inversions/commit/8d2ecf11f93a376e83422ca169f4951869df314e.

No filtering was applied to the 4-hourly averaged observations.

Trace parameters were set at: nit = 250000 burn = 50000 tune = 125000

Ideally, the largest model runs (20 sites, 500 basis functions) should be brought down to 8 hours, but this target can definitely be adjusted if needed.

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Inputs | Num. obs | Run ID | Run time | Memory usage -- | -- | -- | -- | -- n_basis 50 n_sites = 10 | 1741 | 5411818 | 2H 15mins | 16 GB n_basis = 100 n_sites = 10 | 1741 | 5411819 | 3H 15mins | 16 GB n_basis = 500 n_sites = 10 | 1741 | 5418801 | 5H 50mins | 20 GB   |   |   |   |   n_basis = 100 n_sites = 15 | 2626 | 5411820 | 4H 45mins | 23 GB n_basis = 500 n_sites = 15 | 2626 | 5418813 | 12H 50mins | 26 GB   |   |   |   |   n_basis = 100 n_sites = 20 | 3377 | 5411821 | 6H 30mins | 30 GB n_basis = 200 n_sites = 20 | 3377 | 5460436 | 11H 30mins | 30 GB n_basis = 500 n_sites = 20 | 3377 | 5418855 | 16H 50mins | 31 GB

This links to a larger task in the project overview: https://github.com/ACRG-Bristol/projects/issues/30

brendan-m-murphy commented 1 year ago

Thanks for these benchmarks Alice. Once I have some ideas to test out, maybe you could share the details or rerun some of these from a new branch.

brendan-m-murphy commented 1 year ago

@aliceramsden are these runs based off the paris no filtering .ini files in the paris_total_methane_inversion directory on BP1?

Do you know if these .ini files just depend on the PR that I reverted? I should be able to add those changes back in soon.

aliceramsden commented 1 year ago

I've realised I've been moving some files around that paris_total_methane_inversion directory so I can run some different versions of the model (sorry, that's very unhelpful of me!).

So the .ini files inparis_total_methane_inversion/hbmcmc_input_output/all_sites are now the full model runs with all the sites, which I won't move from that folder now. If you want to recreate the tests detailed above in that table, you could start with one of those .ini files and reduce the number of sites, or the number of basis functions.

And yes, I think I used a branch of the code which had that the data preprocessing (the ModelScenario and footprints_sensitivity etc. bits) pulled out of the main hbmcmc function.

If you wanted to continue with these speed tests but with the current main/develop version of the repository (without those changes added back in) you could just pull out the list of sites, fluxes, bc etc. given in these ini files and use them with the standard .ini file format, hopefully that will still work.

Let me know if you need any more info

brendan-m-murphy commented 5 months ago

Some preliminary results, for CH4 based off of the PARIS ini files. This is for January 2021. I saved the merged data ahead of time. For 20 sites, that took about 30 minutes and 7.5 GB of memory.

Other parameters:

The merged data was loaded from a .zarr file. I'm not sure if this is responsible for the lowered memory usage; there was also a performance problem that occurred with combine_datasets occasionally, which was maybe inflating the memory use and running times. I'll try to recreate the table Alice made once I have the set up for doing these testing runs refined a bit (currently it's a lot of work to do manually).

      runtime runtime memory memory
sampler     numpyro pymc numpyro pymc
sites nbasis xprior        
10 50 lognormal 00:03:21 01:09:04 1.69 GB 1.83 GB
10 50 truncnorm 00:03:26 00:43:55 1.87 GB 1.78 GB
10 100 lognormal 00:05:16 01:10:45 1.70 GB 1.87 GB
10 100 truncnorm 00:04:17 00:51:45 1.71 GB 1.87 GB
10 500 lognormal 00:15:28 01:21:45 2.02 GB 2.76 GB
10 500 truncnorm 00:13:56 01:50:51 2.92 GB 2.76 GB
15 100 lognormal 00:06:14 01:31:21 2.27 GB 2.43 GB
15 100 truncnorm 00:07:15 01:03:08 2.28 GB 2.45 GB
15 500 lognormal 00:23:48 03:23:47 3.51 GB 3.33 GB
15 500 truncnorm 00:14:49 05:58:35 3.52 GB 3.33 GB
20 100 lognormal 00:10:53 04:38:34 2.87 GB 3.05 GB
20 100 truncnorm 00:06:45 02:48:50 2.87 GB 3.09 GB
20 200 lognormal 00:09:20 03:08:36 2.97 GB 3.27 GB
20 200 truncnorm 00:11:06 02:58:02 3.38 GB 3.26 GB
20 500 lognormal 00:21:32 03:20:58 4.16 GB 3.93 GB
20 500 truncnorm 00:24:09 03:09:18 4.17 GB 3.94 GB