paboyle / Grid

Data parallel C++ mathematical object library
GNU General Public License v2.0
154 stars 110 forks source link

Memory issues after deallocating shared memory region (Cori GPU) #237

Open giltirn opened 5 years ago

giltirn commented 5 years ago

In my CPS A2A code I have always deallocated Grid's shared memory region using Grid::GlobalSharedMemory::SharedMemoryFree() after I am done using the library (the remaining code is pure CPS) in order to save memory. On Cori GPU this appears to be causing future memory allocations performed using CudaMallocManaged to fail for allocs >~ 32MB. Looking at the code it seems the free is universally being performed using munmap whereas under the GPU compile the alloc is being performed with CudaMalloc, and this is likely completely messing up the managed memory!

paboyle commented 4 years ago

Will take a look; normally I expect this region to be live the whole run time of application - especially given the huge page issues etc...