lesgourg / class_public

Public repository of the Cosmic Linear Anisotropy Solving System (master for the most recent version of the standard code; GW_CLASS to include Cosmic Gravitational Wave Background anisotropies; classnet branch for acceleration with neutral networks; ExoCLASS branch for exotic energy injection; class_matter branch for FFTlog)
225 stars 282 forks source link

High memory usage when increasing precision settings #521

Open gplynch619 opened 1 year ago

gplynch619 commented 1 year ago

Hello all,

I am trying to compute a batch of CMB power spectra with increased precision on a cluster, and am encountering (seemingly) anomalous spikes in RES memory usage once the calculation has entered the lensing module, with some setting configurations. This seems to only be occurring when I am using a high number of OpenMP threads (at least 16 or above, but I have not tested what the lower limit is). Here is a plot comparing the memory usage of 5 different jobs I submitted on a cluster. Each of these jobs had OMP_NUM_THREADS=32 (I am trying to find an optimum thread number to use, and am just trying out 32). The xaxis is technically the number of top updates, which I set to 10/s, so it is roughly in units of .1 seconds. I used default class settings, with the following precision settings modified:

Job 1: l_max_scalars=2500, succeeded Job 2: l_max_scalars=3000, succeeded Job 3: l_max_scalars=4000, succeeded Job 4: l_max_scalars=10000, failed Job 5: l_max_scalars=4000, accurate_lensing=1, failed

This is just an example. Other configurations also failed, I can provide larger list of examples if needed but I am providing this just to get the ball rolling. Perhaps I am underestimating the memory requirements when increasing precision for lensing computations. I was wondering if anyone any ideas here.

image

gplynch619 commented 1 year ago

Well, after some more tests I think this is probably not a bug and rather me underestimating the memory requirements for these higher precision calculations. I was able to run in the configuration I needed, i.e.:

precision_settings= {'accurate_lensing': 1,
                        'k_max_tau0_over_l_max': 25,
                        'perturbations_sampling_stepsize': 0.05,
                        'l_max_scalars': 10000,
                        'non_linear': 'hmcode',
                        'eta_0':0.603,
                        'c_min': 3.13
                        }

By increasing the memory allocated to each cpu, which on my cluster corresponds to a thread. Setting --mem-per-cpu=500M alleviated this, but it could maybe go lower. I ran class with these precision settings, with 1 MPI task and 32 OMP threads. This was the memory usage of the overall task, where the xaxis is once again in roughly .1 seconds. The total memory usage spikes up to 12 GB! I will leave this issue open for the time being to receive confirmation that this is in the expected memory usage for higher precision, but as I said I no longer believe this to be a bug!

Screenshot 2023-05-19 at 4 03 52 PM