Open shwina opened 4 years ago
Granted it's on the Python level where I'm not too concerned, but this kinda sounds like scope creep to me.
My hope would be to keep RMM as narrowly focused as we can.
Maybe a little. Though am less worried about that personally. Where else would you imagine it living (if not RMM)?
I'm a big fan of "Do One Thing". Scope creep is how libraries become complicated behemoths that are difficult to maintain and change.
This issue is similar to why I pushed back on https://github.com/rapidsai/rmm/pull/220. RMM shouldn't be concerned about device memory allocated by anything other than RMM. Anything beyond that is the concern of another layer or library.
I don't have sufficient Python expertise to know where Buffer
should live, but I'm not a big fan of just tacking it onto RMM simply because it is convenient.
From my view I'm not viewing this as tacking this onto rmm because it's convenient, in fact it's a little bit inconvenient if anything. I view this as managing memory that rapids libraries could potentially use, it just so happens that the memory wasn't allocated by the RMM allocator.
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
Can you just replace cudf.Buffer
with rmm.DeviceBuffer
?
Unfortunately no, because a cudf.Buffer
could contain arbitrary device memory backed objects underneath it, such as CuPy/Numba arrays, whereas a rmm.DeviceBuffer
is owning.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Unfortunately no, because a
cudf.Buffer
could contain arbitrary device memory backed objects underneath it, such as CuPy/Numba arrays, whereas armm.DeviceBuffer
is owning.
This sounds like what in C++ we call span
.
Yeah that seems similar.
Basically cudf.Buffer
is a non-owning view onto the memory. It holds a reference to the object that owns the memory.
I hadn't seen this issue before, but I recently started thinking about this after a conversation with @shwina and @jrhemstad about GPU memory management in Python. It seems like what we're looking for is a generic gpumemoryview
with similar semantics to Python's memoryview
, i.e. a standard representation for a non-owning view into memory. This object would be agnostic to the source of the underlying allocation. Most libraries could operate on the view alone (similar to libcudf C++ algorithms) and Python's reference counting logic would handle issues of scope. If a library needed to allocate memory internally (e.g. upon creation of a cudf.DataFrame
) it could create the required rmm buffers then immediately generate the view and store it under the dataframe's columns exactly as cudf.Buffer
is used right now. This approach would allow us to think bigger than RAPIDS alone so that we could also naturally consume data owned by CuPy, PyTorch, TensorFlow, and others.
If we went this route there would be a number of open questions to address here. Such an object could work directly with __cuda_array_interface__
, but that would differ from how Python currently works: a memoryview
is designed to work with the Python buffer protocol at the C level, while numpy (and numpy-like array libraries) are what consume __array_interface__
. On the other hand, developing a GPU analog to the buffer protocol is probably unnecessary given the degree to which the ecosystem has centralized around CAI, so that semantic inconsistency is probably preferable to trying to overengineer a C API. There's also the original question on this thread of where such an object might live. I would probably advocate for a standalone package at least to start with, but I could see a case for rolling it into CUDA Python or something related to it.
CC @gmarkall @leofang
Yeah we already did this in ucx-py with Array
. We could refactor this out into a separate thing we depend on.
cc @pentschev @madsbk @quasiben
I have been devising an interface for C/C++ libraries to share a memory pool, and I would need a way to expose this interface all the way to Python so that users can set it from Python. This sounds like a good idea worth exploring.
i.e. a standard representation for a non-owning view into memory.
I thought that was the purpose of CAI? What's the difference between gpumemoryview
and CAI?
It can help to have something in Cython as it can expose Python & C APIs
That's true, but in my view the distinction is a little more fundamental. CAI is purely descriptive, so objects implementing the CAI must always be converted into some internal representation like cudf.Buffer
that can be operated on programmatically. A gpumemoryview
would replace that so that various libraries don't need to reinvent the wheel. To @jrhemstad's (much earlier) point about scope creep, I agree that the idea of a generic view into memory is out of scope for RMM because the underlying allocation need not come from RMM. CAI was developed so that different libraries don't need to implement a standard API in order to be interoperable. Having a standalone object representation of the allocation represented by CAI would improve that interoperability and provide a standard interchange object rather than a format alone. As long as the object itself supported the CAI we could pass it to libraries that support CAI even if they were unaware of gpumemoryview
and it would just work, but libraries could optionally build around gpumemoryview
(instead of e.g. cudf.Buffer
) and avoid needing to perform any preprocessing when provided one.
Yep I understand. The Array
object in UCX-Py does both those things. We can rip it out into a stand alone library
@jakirkham Sorry, didn't mean that as a correction, more as an extended explanation to answer @jrhemstad's question.
Yep I understand. The
Array
object in UCX-Py does both those things. We can rip it out into a stand alone library
I am very much in favor of a stand alone library implementing something like Array
and all its nice-to-have utility functions.
Happy my comment helped kickstart this discussion again! Sounds like there is agreement on a direction forward.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
I'm going to work on documenting next steps for this sometime soon.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Question regarding moving
cudf.Buffer
tormm
:rmm.DeviceBuffer
is a Cython wrapper around the C++ classrmm::device_buffer
.cudf.Buffer
more generally represents an untyped device memory allocation:A
Buffer
can be constructed from any object exposing__array_interface__
or__cuda_array_interface__
, e.g., CuPy arrays, numpy arrays, etc.,Does it make sense for
Buffer
to be moved to RMM?