Open shrshi opened 11 months ago
I believe https://github.com/rapidsai/cudf/pull/14354 will address this
Removing default value for stream in scalars, is good suggestion. column
also has similar copy constructor with defaulted stream
argument. should similar removal be done for it too?
These are public interfaces and all public APIs currently have defaulted stream parameters. Wouldn't removing the default break many upstream callers? Specifically, the cuDF Cython and Python layers would be required to manufacture a stream to create a scalar? I don't think we should remove the default from any specific public interfaces unless we consider removing them from all perhaps.
Specifically, the cuDF Cython and Python layers would be required to manufacture a stream to create a scalar?
Right. We can't remove stream from scalar methods or any other public interfaces.
cudf data types does not store streams (scalar, column, table, groupby). https://docs.rapids.ai/api/libcudf/nightly/developer_guide (libcudf API and Implementation → Streams). streams are passed to libcudf APIs. So, storing them inside data structure doesn't align well, and also execution should use stream (including memory allocation), IMO, data structures should not hold streams (although rmm differs from this). The constructors are called with streams because memory allocation and copy will execute on streams.
It's made clear than the default arguments on public APIs can't be removed without a breaking change. Clarification on the issue title:
Stream-preserving constructors for scalar class and its derived classes
Does this mean to propose storing streams inside scalar class?
Could we take a step back here and ask why
When the class constructor is implicitly called (when the object is passed-by-value, for instance)
is happening at all? cudf APIs are generally designed so that APIs should accept views rather than values. We don't have a scalar_view
type (for reasons) but we typically handle this by instead passing references to scalar objects. Are there places where we really should be passing scalars by value? If so, why? Perhaps I'm missing something obvious, but this seems like potentially a design smell to me. If a scalar is passed by value and a copy constructor is called that implies the construction of a new scalar and a new device allocation, which isn't usually desirable.
IOW I don't know if there's a good reason for the copy constructor to not be explicit in the long run. I made the same point in a comment on the corresponding PR as well.
Is your feature request related to a problem? Please describe. When the class constructor is implicitly called (when the object is passed-by-value, for instance), the constructor is called with the default stream, and not the CUDA stream passed by the user on which the algorithm is running.
string_scalar(string_scalar const& other, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());
Describe the solution you'd like Consider requiring stream to be passed to constructor by making it a non-default argument.
string_scalar(string_scalar const& other, rmm::cuda_stream_view stream , rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());
Describe alternatives you've considered None
Additional context None