Open jrhemstad opened 2 years ago
@bdice @rwlee @devavret
Staring at this some more and thinking about names for the "owning" comparators I realized that there is effectively no reason for them to exist.
Once you make the callables freestanding types, the only purpose of the owning comparators is to add the minor convenience of not needing to construct the preprocessed tables yourself. Otherwise we can eliminate them entirely.
// self
table_view input_table{...};
preprocessed_table processed_input{input_table}; // "Expensive" construction that does necessary pre-processing and allocations
auto callable = cudf::row::self::less{p}; // "Cheap" construction that _views_ the internals of the preprocessed table
thrust::sort(..., callable);
...
// two table
table_view lhs{...};
table_view rhs{...};
auto lhs_processed = preprocessed_table{lhs};
auto rhs_processed = preprocessed_table{rhs};
auto callable = cudf::row::two::less{lhs_processed, rhs_processed};
thrust::merge(..., callable);
I have an idea of how we can get rid of needing to be explicit about self
vs two_table
as well that I need to explore some more.
Staring at this some more and thinking about names for the "owning" comparators I realized that there is effectively no reason for them to exist.
This is how I was thinking of designing this originally.
I would prefer some other term like comparator_builder
for the owning ones. So you create just one builder that provides any type of comparator that you need later on, or even to share its preprocessed data.
Alright, here's how we can get rid of the owning comparator types all together (obviously keeping the preprocessed_table
owning type) and get rid of the user-facing "two-table" vs "self" comparators.
https://godbolt.org/z/fz1veocdc
The main idea is to just infer "self" vs "two-table" based on how many tables are passed to the less
constructor. Using deduction guides we can steer these two situations to two different partial specializations of the less
template.
IMO, it looks more natural to do it this way:
auto builder = ...;
auto comp = builder.less(); // builder.equal();
than doing this way:
auto lhs = preprocessed_data{...}; // <= what data? why preprocessed?
auto rhs = preprocessed_data{...};
auto comp = less{lhs, rhs};
Maybe I missed some key insight here?
Cuts both ways :)
auto builder = ...; // what builder? Why build?
auto comp = builder.less(); // less? Is this a predicate? builder is less than what?
Furthermore, the preprocessed_data
was just a shorthand for the proof of concept. The real code would still keep the preprocessed_table
nomenclature.
table_view lhs, rhs;
auto lhs_processed = preprocessed_table{lhs};
auto rhs_processed = preprocessed_table{rhs};
auto callable = less{lhs_processed, rhs_processed};
Also, if you want to reuse the same preprocessed tables with a builder it becomes even more code:
table_view lhs, rhs;
auto lhs_processed = preprocessed_table{lhs};
auto rhs_processed = preprocessed_table{rhs};
auto builder = build{lhs_processed, rhs_processed};
auto comp = builder.less();
All the "builder" does is hide the construction of the preprocessed_table
in some situations. That does provide some minor convenience, but imo not enough to make it worth all the extra code and complexity of providing the builder.
Also, with a "builder" it's not clear what it's lifetime requirements are. Is it owning? How long do I need to keep it alive?
Constructing the preprocessed_table
makes it explicit to the caller that this is an object that needs to be kept alive.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
One additional consideration that came up here during the implementation of the list lexicographic comparator is the need for separate code paths for when the compared tables contain nested data to avoid slowdowns of the non-nested data case due to the compiler's inability to sufficiently inline and optimize the complex code paths involving nested types. Whatever approach we take here will need to account for that as well.
Is your feature request related to a problem? Please describe.
The new experimental row operators require non-trivial preprocessing that involves new allocations whose lifetime must be maintained while attempting to do any row-wise operations on the specified data.
To manage this, we introduced owning and non-owning comparator types.
For example,
self_comparator
is an owning type that handles doing row-wise operations on a single table.self_comparator
is not a binary callable object (it doesn't have anoperator()
). Creating the actual non-owning callable object is currently done through a member factory function of the owning type (renamed in https://github.com/rapidsai/cudf/pull/10870).After reviewing code using this functionality I've noticed that the callable being returned from a member of the owning type is a bit awkward. For instance the
s.less()
call above isn't immediately obvious that this is actually a factory returning a callable function object. One way to remedy that could be to call its.make_less()
instead, but I think there's an all together better way.Describe the solution you'd like Inspired by
std::
function objects likestd::less/std::equal_to
, I think we should make the callable objects currently being returned from functions likeless()/less_equivalent()/equal_to()
to instead be freestanding types that are constructible from the owning types instead of being returned from a factory of the owning type.I think this would simplify the owning type as well as make the owner/viewer relationship more clear and explicit.
For example, the code above would become:
Here's a high level sketch of how this idea could be implemented: https://godbolt.org/z/5fbK7PTb6
Salient points:
less
is a standalone type constructible from theowning
typeless
is afriend
of the owning type to access internalsless
deletes constructions from an r-value ref of the owning type to prevent construction from a temporary of the owning type. If allowed, this would lead to dangling references.less::operator()
just invoke thePhysicalComparator
. The actual implementation would have more layers. It would be roughly equivalent to what the internals of theself_comparator::less
factory above does today.Additional Thoughts
I've come to believe that "comparator" is probably an inappropriate name for the owning types. It's not a comparator (not invokable), it just preprocesses and holds data needed by the actual comparators.
I don't have a good suggestion for a different name yet.