rapidsai / rmm

RAPIDS Memory Manager
https://docs.rapids.ai/api/rmm/stable/
Apache License 2.0
478 stars 195 forks source link

[FEA] Ensure that Cython wrappers for allocating objects correctly expose and use memory resources #1515

Open wence- opened 6 months ago

wence- commented 6 months ago

Is your feature request related to a problem? Please describe.

Historically, the Cython wrappers around device allocation routines use the implicit device resource returned by get_current_device_resource and don't generally offer an interface for the user to provide a memory resource to perform the allocation.

This is "OK", but often (e.g. see #1514) we don't pass the python-side MR into C++ allocation routines.

We therefore actually don't have any control over the MR being used to perform the allocation, and so can (with a little, but not a lot, of gymnastics) get in a situation where an allocation is performed using resource-A but we store resource-B which is not compatible.

This latter dichotomy exists because set/get_current_device_resource exist in both the C++ and Cython wrappers, but don't talk to the same underlying map. On C++ set_current_device_resource stores a raw pointer to a memory resource in a std::unordered_map, and get_current_device_resource returns this raw pointer.

Unfortunately, on the Python side, we can't use a raw pointer since we can't control its lifetime. So in Python set_current_device_resource stores the Python MR in a dict and then calls the C++ set_current_device_resource on the raw pointer that backs the Python MR. So far, all good. But Python-level get_current_device_resource only inspects the dict (it doesn't call the C++ get_current_device_resource). This is where the problems can begin: if a C++ code calls set_current_device_resource after a Python call to the matching API function, the Python and C++ sides of things will no longer agree on what the current device resource is.

Describe the solution you'd like

All Python functionality that involves allocation should optionally take a memory resource argument (similarly to the C++ interface). This argument should default to the Python-side current device resource and should then be passed on to C++, so that by the time Python reaches C++ there is no implicit memory resource being obtained.

Describe alternatives you've considered

Promote interfaces in RMM to take managed, rather than raw pointers. Then we can always arrange that C++ and Python are consistent since we can safely maintain the data in C++ and borrow it from Python. However, this goes against the current strategy using cuda::memory_resource.

Additional context

See also:

harrism commented 6 months ago

Unfortunately, on the Python side, we can't use a raw pointer since we can't control its lifetime.

Naive question: why do we need Python to control the lifetime of the current device resource? Why not just let the C++ library control its lifetime. After all, an application might be using RMM from Python and using a Python module with a C++ core that uses RMM only from C++. These need to have the same resource controls. Setting the current device resource from one place should affect the current device resource everywhere.