philip-davis / dataspaces

Git Home of the RDI2 DataSpaces Project
BSD 2-Clause "Simplified" License
4 stars 1 forks source link

Stack variables being submitted as RDMA post descriptors in GNI transport #30

Closed philip-davis closed 5 years ago

philip-davis commented 7 years ago

See (for example) rpc_fetch_request and rpc_post_request in dart_rpc_gni.c. The passed pointer is stored (at least in Aries), not used for copying, so this causes problems when it goes out of scope. I am checking on Titan to see if this same behavior happens with Gemini. This appears to be the cause of processing errors in __process_event, when running GNI_GetCompleted.

philip-davis commented 7 years ago

Gemini stores the pointer as well, so this is a problem on Titan too. Titan seems to do some things with memory allocation differently from Cori, such that Cori has exposed other memory errors before. I will have to change everywhere the GNI_PostRdma is called with a pointer to a stack variable, and also add a free in __process_event.

philip-davis commented 7 years ago

Implemented the fix in the debug branch (for DART, not DIMES - I'm about 85% sure that DIMES does not have this problem, but I need to trace the RPC creation code backwards to be sure.). ORNL is checking if this resolves their issue.

philip-davis commented 5 years ago

Resolved!