Setting the flag on runs that use around 30-80% of the total GPU memory causes the program to crash with an out of memory error from CUDA in many cases. For example the attached run succeeds with the flag not set, but crashes when the flag is set (all else identical). As profiled in the non-crash case, it uses only 13.2 GB of VRAM (per kpoint, presumably) and each card has 40 GB VRAM. Yet if I set JDFTX_MEMPOOL_SIZE to 38000 it runs out of memory.
Note that this calculation is purely for testing this issue, so please ignore that the number of folded kpoints (9 reduced to 4) is half the number of cards (8), which would be wasteful in practice.
Untitled.txt
Setting the flag on runs that use around 30-80% of the total GPU memory causes the program to crash with an out of memory error from CUDA in many cases. For example the attached run succeeds with the flag not set, but crashes when the flag is set (all else identical). As profiled in the non-crash case, it uses only 13.2 GB of VRAM (per kpoint, presumably) and each card has 40 GB VRAM. Yet if I set
JDFTX_MEMPOOL_SIZE
to 38000 it runs out of memory.Note that this calculation is purely for testing this issue, so please ignore that the number of folded kpoints (9 reduced to 4) is half the number of cards (8), which would be wasteful in practice. Untitled.txt