Open vyasr opened 5 months ago
I think there should be no default parameters, to force the python-side caller to be fully explicit about stream and memory management.
My rationale is that since RMM objects are (deliberately) not smart pointers, there is no way to safely take ownership of an rmm::device_buffer
from Python. Specifically, if I am passed a device_buffer
whose memory resource is of a provenance I do not control, I cannot guarantee that I keep the memory resource alive for the lifetime of the buffer.
Indeed, in RMM, we don't even try!
We just do:
buf.c_obj = move(ptr)
buf.mr = get_current_device_resource()
and fingers crossed, ptr.mr
is the same as get_current_device_resource()
.
There's no way to work around this, so really the only safe way is to document in RMM that if taking ownership of a device_buffer
from Python, one must only do so when the buffer has been allocated via a memory resource that the python process controls. In libcudf, we can always pass a memory resource in to any allocating routines, so if we do that from the python side with a memory resource whose lifetime we control, we can match the required contract and ensure that we do things safely.
Right now, everything works through happenstance.
I've opened https://github.com/rapidsai/rmm/issues/1492 to clarify the requirements in docs on the RMM side.
Right, the mr discussion is in #14229.
Playing devil's advocate: while the pitfalls of default mrs are clear, that also makes the overall API very cumbersome when you have a libcudf API with 7 parameters (including stream/mr) of which 5 are defaulted because there are sensible choices. This would force users to also provide the other 3 in addition to stream/mr.
If we restrict the "no defaults" choice to only APIs then it would still allow us to make some improvements like those suggested in #15130, but if we disallowed defaults everywhere it could have additional impacts on how much syntactic sugar we can add to the pylibcudf API in a performant manner.
I think I am advocating only that the wrapping of the libcudf API provides no defaults. I would alternately be happy if the cpdef
functions took keyword defaulted arguments for stream and mr (so that one can provide, say, the mr but not the stream if one wants)
Is your feature request related to a problem? Please describe. Currently some pylibcudf APIs have default values for parameters while others do not. Defaults are a double-edged sword because while they are convenient, the underlying libcudf APIs also have defaults and the two could diverge.
Describe the solution you'd like pylibcudf APIs should decide whether they offer defaults. The two options would be to either offer no defaults and require users to provide all parameters, or offer the same defaults as libcudf, in which case it would be the responsibility of pylibcudf devs to keep those defaults in sync with libcudf.
Describe alternatives you've considered There isn't really any cost to the defaults in pylibcudf diverging from those in libcudf, so as long as they're documented it might be fine to have defaults for user convenience and not be overly concerned about ensuring that they remain identical to libcudf's in perpetuity.