pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
541 stars 280 forks source link

OFI: memory registration of cuda memory #7148

Open thomasgillis opened 1 week ago

thomasgillis commented 1 week ago

Hi all,

when taking a closer look at #7140 I realized that MPICH seems to not use the correct handle for fi_mr_regattr.

The documentation specifies that:

device Reserved 64 bits for device identifier if using non-standard HMEM interface. This field is ignore unless the iface field is valid. Otherwise, the device field is determined by the value specified through iface. cuda For FI_HMEM_CUDA, this is equivalent to CUdevice (int).

However, MPICH uses attr->device that is obtained from cudaPointerGetAttributes. I am not familiar with the difference between the handle and the device id, but the doc of cuDeviceGet seems to suggest there is a difference:

Returns a handle to a compute device.
Parameters
device
- Returned device handle
ordinal
- Device number to get handle for