Open ttnghia opened 2 years ago
Tag directly related devs: @devavret @jrhemstad @bdice.
The only reason it was one monolithic header is because it was a replacement of the single row_operators.cuh
header. Splitting it is fine. We might need to keep equality and hashing in the same header because they use each other's pre-processed table
Then the hashing header can just include the equality header. Some source files just need equality comparison.
Wait: Using each other table? Then I'm not sure...
This idea seems fine to me. We'll want to make sure that the designs stay parallel (wherever possible) despite being in separate files.
Thanks all. Then I'll go ahead and do this split. Will have a PR up shortly.
I'm fine with this, but let's wait on splitting them up until we are ready to move them out of "experimental". Then we can split them as we move their namespaces.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
I believe that addressing this is blocked on #11844, or at least should be conditioned on addressing that issue. Once the newer comparators are used everywhere, we can remove the old comparators and move the new ones into the places that we want them.
@vyasr @bdice @ttnghia @GregoryKimball
I am about to go ahead and start on this issue, please let me know your thoughts. I prefer breaking these up as:
lexicographical/row_operators.cuh
equality/row_operators.cuh
hash/row_operators.cuh
The reason I prefer to split them up this way is that we will be mirroring the sub-directory structure with the namespaces.
Any common code can be present in row_operators_common.cuh/hpp
.
Furthermore, in my PR I will add some redundancy if any code in experimental
relies on the current version of row_operators.cuh
by copying over those sections. By doing this, when we are ready to drop experimental
, we just delete the old row_operators.cuh
, move up contents of experimental
to the parent directory, and perform simple find-and-replace to remove experimental::
from the call sites.
I have a slight preference for subdirectories over underscores as well, and I am fine with the temporary duplication.
This proposal sounds fine to me. Roughly how far are we from migrating entirely to "experimental" comparators? Are there any hard cases left, or are all the remaining instances relatively straightforward?
@bdice there are a couple of hard cases left. The most notorious one being joins. The actual update is easy, of course, but it's going to be difficult testing each and every single join for nested types in terms of it being a mechanical, time consuming process.
That being said, I'm hoping we can drop experimental by the very end of 23.04 release, or very start of 23.06 release.
So you're proposing:
lexicographic/row_operators.cuh
equality/row_operators.cuh
...
instead of
row_operators/common.cuh
row_operators/lexicographic.cuh
...
Then what other files will be in these directories (lexicographic/
, equality/
, etc)? And where the row_operators_common.cuh/hpp
files will be placed?
@ttnghia no other files for now, but giving us the option to have comparator-specific files to put in these sub-directories in the future.
row_operators_common.cuh/hpp
file will be in the parent directory, which in this case is: cudf/include/table/
I suggest this because the namespacing is super confusing right now. row::lexicographic::...
but we have no directories called row
or lexicographic
The namespace does not always align with directories. You can see many files like cudf/detail/something.hpp
vs cudf/something/detail/something_detail.hpp
which all have the same namespace hierarchy.
That is a specific structure of the detail
API that libcudf follows - I don't see where else cudf doesn't follow namespace <-> directory.
Alternatively, I would also be okay with row/lexicographic.cuh
to follow the namespace row::lexicographic
.
While I have been doing these updates to the experimental API, I always forget what the actual namespace of the comparators are because they don't make any sense when it comes to the directory/file naming.
Currently,
experimental/row_operators.cuh
file is huge: over 1200LOC and is growing. Including the entire file into a source file that only uses a small part of it is burdensome. I also feel dizzy while navigating it trying to find a struct/function of interest.We should split it up (and re-organize files) into separate files like:
experimental/row_operators/common.cuh
experimental/row_operators/lexicographic.cuh
experimental/row_operators/equality.cuh
experimental/row_operators/hashing.cuh