rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.25k stars 534 forks source link

refactor memory ownership svm #6073

Open mfoerste4 opened 2 months ago

mfoerste4 commented 2 months ago

As discussed in #6057 this is a PR to refactor the raw memory allocations in the svm model which are passed back to the caller.

I have removed all raw allocation pointers from the SVM model and replaced them by device_buffers. Since ownership is more restrictive now we do have additional data copies when:

CC @tfeher, @divyegala

mfoerste4 commented 1 month ago

Thanks @divyegala for the rewiew. I have adressed or commented on your immediate suggestions.

I see that the C and Python APIs need a copy and ultimately, this is because rmm::device_buffer in this structure as well as the model is owned by the struct/class. Instead, if we use std::weak_ptr<rmm::device_buffer>> or plain rmm::device_buffer& then we can avoid copies and ownership issues by letting the C and Python API own the data but still allowing the C++ API to resize it.

The C API can also release the shared_ptr/device_buffer to transfer ownership to the user, thus removing the need for a copy there.

The Python API is owned by us and so, we can create shared_ptr/device_buffer in the Cython layer and store them as class members.

I don't quite understand how this can be accomplished with the given device_buffer implementation. Even if we pass the storages as unique pointers - we then can pass ownership of the whole device-buffer, but we will not be able to pass in/out actual memory allocations within the device_buffer.

For the python API we retrieve data as CumlArrayand want to pass the ptr into the device_buffer. For the C-API we have fixed pointer style I/O where cannot simply switch to device_buffer*. IIUC we would need a class like the device_buffer but with additional capabilities to switch in between owning and not-owning.

divyegala commented 1 month ago

@mfoerste4

Let me write an example for Python API and how to handle ownership, assuming the C++ API does not own the memory and instead holds onto references of the form rmm::device_buffer&. As for the C API, I think you are right and there's no way for us to release the ownership of memory, but at least we can avoid copies in the Python world:

Python API: As mentioned in the original issue, instead of CumlArray we can use DeviceBuffer from RMM Python

  1. .pxd
  2. .pyx

As seen in the .pxd definition, the Python DeviceBuffer class holds an underlying reference to the C++ object device_buffer in the form of the property unique_ptr<device_buffer> c_obj. Thus, the code looks like:

class SVM:
def __init__(self):
  self.some_array = rmm.DeviceBuffer(...) # instead of CumlArray

def some_func(self):
  cpp_func(deref(some_array.c_obj.get()), ...)
mfoerste4 commented 1 month ago

@divyegala , thanks for your suggestions. I have refactored the SvmModel to work with device_buffer* in order to paste in the buffers after (de-)serialization without a copy.