Open philomat opened 3 years ago
I'm not sure if it's the same thing, but in my simulations the memory usage also gradually increases over time. I'm not sure if it's me or gpt, but eventually the program bails because it uses all the memory on my computer.
I was just trying to run a couple of test on Summit. Unfortunately, all of my tests ran out of memory and I do not understand why this is happening.
I am trying to run a simple measurement of a pion correlator on a 64^4 lattice with physical pion mass using the Wilson Clover Dirac operator. I am using a 2-lvl multigrid solver, that I basically took from tests/algorithms/multigrid.py, the only difference is that I use a g.mspincolor(grid) as rhs.
Here is some output from some g.mem_report()'s that I put in my code at several places:
After loading the gauge field:
GPT : 131.608838 s : ==================================================================================================================================== GPT : 131.608853 s : GPT Memory Report GPT : 131.608865 s : ==================================================================================================================================== GPT : 131.608876 s : Index Grid Precision OType CBType Size/GB Created at time GPT : 131.608912 s : 0 [64, 64, 64, 64] double ot_matrix_su_n_fundamental_group(3) full 2.25 122.323134 s GPT : 131.608932 s : 1 [64, 64, 64, 64] double ot_matrix_su_n_fundamental_group(3) full 2.25 122.784528 s GPT : 131.608951 s : 2 [64, 64, 64, 64] double ot_matrix_su_n_fundamental_group(3) full 2.25 123.230915 s GPT : 131.608968 s : 3 [64, 64, 64, 64] double ot_matrix_su_n_fundamental_group(3) full 2.25 123.696448 s GPT : 131.608979 s : ==================================================================================================================================== GPT : 131.608989 s : Lattice fields on all ranks 9 GB GPT : 131.608999 s : Lattice fields per rank 0.140625 GB GPT : 131.609009 s : Resident memory per rank 3.50134 GB GPT : 131.609019 s : Total memory available (host) 504.182 GB GPT : 131.609030 s : Total memory available (accelerator) 9.61707 GB GPT : 131.609038 s : ====================================================================================================================================
After setting the point src:
GPT : 131.801943 s : ==================================================================================================================================== GPT : 131.801998 s : GPT Memory Report GPT : 131.802015 s : ==================================================================================================================================== GPT : 131.802032 s : Index Grid Precision OType CBType Size/GB Created at time GPT : 131.802074 s : 0 [64, 64, 64, 64] double ot_matrix_su_n_fundamental_group(3) full 2.25 122.323134 s GPT : 131.802102 s : 1 [64, 64, 64, 64] double ot_matrix_su_n_fundamental_group(3) full 2.25 122.784528 s GPT : 131.802129 s : 2 [64, 64, 64, 64] double ot_matrix_su_n_fundamental_group(3) full 2.25 123.230915 s GPT : 131.802156 s : 3 [64, 64, 64, 64] double ot_matrix_su_n_fundamental_group(3) full 2.25 123.696448 s GPT : 131.802183 s : 4 [64, 64, 64, 64] double ot_matrix_spin_color(4,3) full 36 131.609090 s GPT : 131.802198 s : ==================================================================================================================================== GPT : 131.802214 s : Lattice fields on all ranks 45 GB GPT : 131.802230 s : Lattice fields per rank 0.703125 GB GPT : 131.802246 s : Resident memory per rank 3.5094 GB GPT : 131.802261 s : Total memory available (host) 503.6 GB GPT : 131.802279 s : Total memory available (accelerator) 9.61707 GB GPT : 131.802292 s : ====================================================================================================================================
After the multigrid setup, here I only state the report without details but there are an additional 30 instances of ot_vector_spin_color(4,3)'s each coming with a size of 3GB:
GPT : 345.357736 s : ==================================================================================================================================== GPT : 345.357749 s : Lattice fields on all ranks 135 GB GPT : 345.357763 s : Lattice fields per rank 2.10938 GB GPT : 345.357776 s : Resident memory per rank 4.46045 GB GPT : 345.357788 s : Total memory available (host) 496.169 GB GPT : 345.357802 s : Total memory available (accelerator) 9.21082 GB GPT : 345.357813 s : ====================================================================================================================================
So up until here everything seems fine. The only confusing thing to me is the "Total memory available (accelerator)" I am running on 64 GPUs and I would expect this to be bigger, but maybe this is just the memory available on a single GPU. Also I do not understand what "Resident memory per rank" means.
Now, I define my propagator and start the inversions by calling:
dst = g.eval(fgmres_outer(w) * src)
And I immediately run out of memory:
cudaMalloc failed for 603979776 out of memory
Obviously, I the inverter needs additional memory not only for the propagator but also all the helper fields used internally, but why more than 800GB?
My initially guess is that the inverter tries to run on a single GPU and therefore runs out of memory. But how do I tell it to distribute the work over all cards?
I tried running the program with:
jsrun --smpiargs="-gpu" -n 64 -c 6 -a 1 -g 1 python pion_2lvl_mg.py --mpi 2.2.4.4