shankar1729 / jdftx

JDFTx: software for joint density functional theory
http://jdftx.org
81 stars 54 forks source link

JDFTX_MEMPOOL_SIZE causes memory allocation errors when memory usage >= JDFTX_MEMPOOL_SIZE / 4 #361

Open ColinBundschu opened 2 hours ago

ColinBundschu commented 2 hours ago

Setting the flag on runs that use around 30-80% of the total GPU memory causes the program to crash with an out of memory error from CUDA in many cases. For example the attached run succeeds with the flag not set, but crashes when the flag is set (all else identical). As profiled in the non-crash case, it uses only 13.2 GB of VRAM (per kpoint, presumably) and each card has 40 GB VRAM. Yet if I set JDFTX_MEMPOOL_SIZE to 38000 it runs out of memory.

Note that this calculation is purely for testing this issue, so please ignore that the number of folded kpoints (9 reduced to 4) is half the number of cards (8), which would be wasteful in practice. Untitled.txt

ColinBundschu commented 2 hours ago

Additionally this issue happens on both Polaris and Jetstream2. It seems platform independent.