While investigating cuml benchmarks, I found an issue with the current system_memory_resource that causes segfault. Roughly it's in code like this:
void foo(...) {
rmm::device_uvector<T> tmp(bufferSize, stream);
// launch cuda kernels making use of tmp
}
When the function returns, the device_uvector would go out of scope and get deleted, while the cuda kernel might still be in flight. With cudaFree, the CUDA runtime would perform implicit synchronization to make sure the kernel finishes before actually freeing the memory, but with SAM we don't have that guarantee, thus causing use-after-free errors.
Steps/Code to reproduce bug
This was discovered by running the Spark RAPIDS ML benchmark with system mr enabled.
Expected behavior
Should not segfault.
Environment details (please complete the following information):
Describe the bug
While investigating cuml benchmarks, I found an issue with the current
system_memory_resource
that causes segfault. Roughly it's in code like this:When the function returns, the
device_uvector
would go out of scope and get deleted, while the cuda kernel might still be in flight. WithcudaFree
, the CUDA runtime would perform implicit synchronization to make sure the kernel finishes before actually freeing the memory, but with SAM we don't have that guarantee, thus causing use-after-free errors.Steps/Code to reproduce bug This was discovered by running the Spark RAPIDS ML benchmark with system mr enabled.
Expected behavior Should not segfault.
Environment details (please complete the following information):