rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.09k stars 874 forks source link

[FEA] Scalar factory functions do not let you provide a validity bool. #8633

Open nvdbaranec opened 3 years ago

nvdbaranec commented 3 years ago

Scalars can be null and contain a validity bool. The constructors for the various scalars take an is_valid parameter for this purpose. But the factory functions themselves don't let you specify one in the call. Seems like an oversight. This applies to all scalar types.

struct_scalar(table&& data,
                bool is_valid                       = true,
                rmm::cuda_stream_view stream        = rmm::cuda_stream_default,
                rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());
std::unique_ptr<scalar> make_struct_scalar(
  table_view const& data,
  rmm::cuda_stream_view stream        = rmm::cuda_stream_default,
  rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());
jrhemstad commented 3 years ago

The reason is because if you're constructing the scalar with an existing value, then it wouldn't make sense for it to be null.

nvdbaranec commented 3 years ago

The case here was for testing. Creating a null scalar and doing some interesting things with it (exploding it to a column, running copy_if_else(), etc)

jrhemstad commented 3 years ago

The case here was for testing. Creating a null scalar and doing some interesting things with it (exploding it to a column, running copy_if_else(), etc)

I don't think you'd use a factory then. I think you can just create a default constructed scalar of the appropriate type.

nvdbaranec commented 3 years ago

Another reason this doesn't work is: for nested types, you need to have a fully formed hierarchy of columns (the type-is-defined-by-the-hierarchy issue), even if the root scalar is null. So any constructor that takes just a validity bool isn't sufficient.

brandon-b-miller commented 3 years ago

Another reason this doesn't work is: for nested types, you need to have a fully formed hierarchy of columns (the type-is-defined-by-the-hierarchy issue), even if the root scalar is null. So any constructor that takes just a validity bool isn't sufficient.

dropping in to say we're running into this as well on the python side and creating basically an empty placeholder hierarchy and directly constructing the scalars without using the factories. cc @shaneding

jrhemstad commented 3 years ago

I've no problem with changing the current design. I was only explaining the rational of the way things are now.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.