In my CPS A2A code I have always deallocated Grid's shared memory region using Grid::GlobalSharedMemory::SharedMemoryFree() after I am done using the library (the remaining code is pure CPS) in order to save memory. On Cori GPU this appears to be causing future memory allocations performed using CudaMallocManaged to fail for allocs >~ 32MB. Looking at the code it seems the free is universally being performed using munmap whereas under the GPU compile the alloc is being performed with CudaMalloc, and this is likely completely messing up the managed memory!
In my CPS A2A code I have always deallocated Grid's shared memory region using Grid::GlobalSharedMemory::SharedMemoryFree() after I am done using the library (the remaining code is pure CPS) in order to save memory. On Cori GPU this appears to be causing future memory allocations performed using CudaMallocManaged to fail for allocs >~ 32MB. Looking at the code it seems the free is universally being performed using munmap whereas under the GPU compile the alloc is being performed with CudaMalloc, and this is likely completely messing up the managed memory!