[BUG] System MR causes segfault

Describe the bug

While investigating cuml benchmarks, I found an issue with the current system_memory_resource that causes segfault. Roughly it's in code like this:

void foo(...) {
  rmm::device_uvector<T> tmp(bufferSize, stream);
  // launch cuda kernels making use of tmp
}

When the function returns, the device_uvector would go out of scope and get deleted, while the cuda kernel might still be in flight. With cudaFree, the CUDA runtime would perform implicit synchronization to make sure the kernel finishes before actually freeing the memory, but with SAM we don't have that guarantee, thus causing use-after-free errors.

Steps/Code to reproduce bug This was discovered by running the Spark RAPIDS ML benchmark with system mr enabled.

Expected behavior Should not segfault.

Environment details (please complete the following information):

Environment location: Bare-metal
Method of RMM install: conda

rapidsai / rmm

[BUG] System MR causes segfault #1656