Closed minsii closed 3 years ago
One difference in the Level Zero API is that there is an explicit device handle argument to the memory allocator. I think we can handle this initially by just getting a handle (and context) at init time and using the same one for all allocations. How does that sound @minsii? We'll need to add an init function like we have in MPL to gather device information.
https://spec.oneapi.com/level-zero/latest/core/api.html#zememallocdevice
FYI, CUDA needs you to set the device before doing memory allocation too (even though the function itself doesn't take the device argument). So they are equivalent (except for one function call vs. two function calls).
Oh, I see. So in the tests, the user sets the CUDA device before creating the space. I suppose we can extend shmemx_space_config_t
to take additional device information for this purpose, like we do with MPL attributes. Still need to modify the internal allocation APIs, but that shouldn't be too bad.
@raffenet here is an example for the user program of oshmpi+space+cuda memkind.
We assume the user sets the cuda device, so that shmemx_space_create
internally calls cudaMalloc
to allocate buffer on this device.
Extending shmemx_space_config_t
sounds good to me. We want to minimize the device-specific task handled by OSHMPI - user sets device, OSHMPI only allocates the GPU buffer and passes down to MPI
Test added via #107