pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
535 stars 280 forks source link

pamid: bgq must not use 'BG_MAPCOMMONHEAP=1' as a shared memory substitute. #2092

Closed mpichbot closed 7 years ago

mpichbot commented 7 years ago

Originally by blocksom on 2014-05-15 16:15:18 -0500


Several mpich tests are now failing with the following assert:

runjob -n 4 --ranks-per-node=8 --block R00-M0-N00-64 --envs BG_MAPCOMMONHEAP=1 : win_shared
...
win_shared: /bghome/blocksom/development/issues/v1r2m2/bgq/comm/sys/buildtools/pami/common/bgq/Memregion.h:59: pami_result_t PAMI::Memregion::createMemregion_impl(size_t*, size_t, void*, uint64_t): Assertion `rc == 0' failed.

This assert happens when the Kernel_CreateMemoryRegion kernel spi fails because the address passed to the kernel is outside of the calling process's memory domain.

The 'BG_MAPCOMMONHEAP=1' can be used to shared data between processes, but only as a side effect of the intended purpose of this feature. Any time a process uses a "common" heap memory address (that was not malloc'd by the process) as an input parameter to kernel syscalls "bad things will happen".

The existing MPI-3 shared window bgq implementation needs to be switched out in favor of a more traditional shared memory approach.

mpichbot commented 7 years ago

Originally by balaji on 2014-05-15 20:50:58 -0500


In what case would this fail? I assume we can still use this model when just doing load/store operations? Can I use the buffer for send/recv operations?

mpichbot commented 7 years ago

Originally by blocksom on 2014-05-16 10:11:41 -0500


It would fail whenever you ask the kernel (spi or syscall) to operate on an address which was malloc'd from the common heap by another process - because the kernel is still checking to see if the address is within the calling process's memory. One such check is in the spi/syscall that resolved a vaddr to a paddr for use as a PAMI memory region for PAMI rdma/onesided operations. It could be triggered by other syscalls, perhaps a posix write or something like that, but I haven't yet tested which other operations would cause similar failures. I could rework the code to avoid creating memory regions in this case, but I'm worried about what the application may do with this buffer - causing a similar error to occur - so I'd rather just avoid using this "common heap" feature to be safe.

The intended purpose of BG_MAPCOMMONHEAP was to better balance the memory layout on a bgq node. For example, the entire text segment is "charged" to process 0 which means process 0 has a smaller heap, etc than the other processes on the node. This "common heap" smoothes out this imbalance, and also has the benefit that if an application process needs to allocate more memory than its peers it is allowed to do so.

The implementation relaxed some security checks in the kernel - but not all of them.

mpichbot commented 7 years ago

Originally by blocksom on 2014-05-27 15:19:22 -0500


Fixed by 2b401aaf6188830db802bd9432028ca3526a699c