rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.34k stars 888 forks source link

[FEA] Stream-preserving constructors for scalar class and its derived classes #14333

Open shrshi opened 11 months ago

shrshi commented 11 months ago

Is your feature request related to a problem? Please describe. When the class constructor is implicitly called (when the object is passed-by-value, for instance), the constructor is called with the default stream, and not the CUDA stream passed by the user on which the algorithm is running. string_scalar(string_scalar const& other, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

Describe the solution you'd like Consider requiring stream to be passed to constructor by making it a non-default argument. string_scalar(string_scalar const& other, rmm::cuda_stream_view stream , rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

Describe alternatives you've considered None

Additional context None

GregoryKimball commented 11 months ago

I believe https://github.com/rapidsai/cudf/pull/14354 will address this

karthikeyann commented 10 months ago

14354 replaces all such occurences but it does not remove the default argument.

Removing default value for stream in scalars, is good suggestion. column also has similar copy constructor with defaulted stream argument. should similar removal be done for it too?

davidwendt commented 10 months ago

These are public interfaces and all public APIs currently have defaulted stream parameters. Wouldn't removing the default break many upstream callers? Specifically, the cuDF Cython and Python layers would be required to manufacture a stream to create a scalar? I don't think we should remove the default from any specific public interfaces unless we consider removing them from all perhaps.

karthikeyann commented 10 months ago

Specifically, the cuDF Cython and Python layers would be required to manufacture a stream to create a scalar?

Right. We can't remove stream from scalar methods or any other public interfaces.

cudf data types does not store streams (scalar, column, table, groupby). https://docs.rapids.ai/api/libcudf/nightly/developer_guide (libcudf API and Implementation → Streams). streams are passed to libcudf APIs. So, storing them inside data structure doesn't align well, and also execution should use stream (including memory allocation), IMO, data structures should not hold streams (although rmm differs from this). The constructors are called with streams because memory allocation and copy will execute on streams.

karthikeyann commented 10 months ago

It's made clear than the default arguments on public APIs can't be removed without a breaking change. Clarification on the issue title:

Stream-preserving constructors for scalar class and its derived classes

Does this mean to propose storing streams inside scalar class?

vyasr commented 10 months ago

Could we take a step back here and ask why

When the class constructor is implicitly called (when the object is passed-by-value, for instance)

is happening at all? cudf APIs are generally designed so that APIs should accept views rather than values. We don't have a scalar_view type (for reasons) but we typically handle this by instead passing references to scalar objects. Are there places where we really should be passing scalars by value? If so, why? Perhaps I'm missing something obvious, but this seems like potentially a design smell to me. If a scalar is passed by value and a copy constructor is called that implies the construction of a new scalar and a new device allocation, which isn't usually desirable.

IOW I don't know if there's a good reason for the copy constructor to not be explicit in the long run. I made the same point in a comment on the corresponding PR as well.