rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.27k stars 884 forks source link

[FEA] Split `experimental/row_operators.cuh` #11012

Open ttnghia opened 2 years ago

ttnghia commented 2 years ago

Currently, experimental/row_operators.cuh file is huge: over 1200LOC and is growing. Including the entire file into a source file that only uses a small part of it is burdensome. I also feel dizzy while navigating it trying to find a struct/function of interest.

We should split it up (and re-organize files) into separate files like:

ttnghia commented 2 years ago

Tag directly related devs: @devavret @jrhemstad @bdice.

devavret commented 2 years ago

The only reason it was one monolithic header is because it was a replacement of the single row_operators.cuh header. Splitting it is fine. We might need to keep equality and hashing in the same header because they use each other's pre-processed table

ttnghia commented 2 years ago

Then the hashing header can just include the equality header. Some source files just need equality comparison.

Wait: Using each other table? Then I'm not sure...

bdice commented 2 years ago

This idea seems fine to me. We'll want to make sure that the designs stay parallel (wherever possible) despite being in separate files.

ttnghia commented 2 years ago

Thanks all. Then I'll go ahead and do this split. Will have a PR up shortly.

jrhemstad commented 2 years ago

I'm fine with this, but let's wait on splitting them up until we are ready to move them out of "experimental". Then we can split them as we move their namespaces.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

vyasr commented 1 year ago

I believe that addressing this is blocked on #11844, or at least should be conditioned on addressing that issue. Once the newer comparators are used everywhere, we can remove the old comparators and move the new ones into the places that we want them.

divyegala commented 1 year ago

@vyasr @bdice @ttnghia @GregoryKimball

I am about to go ahead and start on this issue, please let me know your thoughts. I prefer breaking these up as:

The reason I prefer to split them up this way is that we will be mirroring the sub-directory structure with the namespaces.

Any common code can be present in row_operators_common.cuh/hpp.

Furthermore, in my PR I will add some redundancy if any code in experimental relies on the current version of row_operators.cuh by copying over those sections. By doing this, when we are ready to drop experimental, we just delete the old row_operators.cuh, move up contents of experimental to the parent directory, and perform simple find-and-replace to remove experimental:: from the call sites.

vyasr commented 1 year ago

I have a slight preference for subdirectories over underscores as well, and I am fine with the temporary duplication.

bdice commented 1 year ago

This proposal sounds fine to me. Roughly how far are we from migrating entirely to "experimental" comparators? Are there any hard cases left, or are all the remaining instances relatively straightforward?

divyegala commented 1 year ago

@bdice there are a couple of hard cases left. The most notorious one being joins. The actual update is easy, of course, but it's going to be difficult testing each and every single join for nested types in terms of it being a mechanical, time consuming process.

That being said, I'm hoping we can drop experimental by the very end of 23.04 release, or very start of 23.06 release.

ttnghia commented 1 year ago

So you're proposing:

lexicographic/row_operators.cuh
equality/row_operators.cuh
...

instead of

row_operators/common.cuh
row_operators/lexicographic.cuh
...

Then what other files will be in these directories (lexicographic/, equality/, etc)? And where the row_operators_common.cuh/hpp files will be placed?

divyegala commented 1 year ago

@ttnghia no other files for now, but giving us the option to have comparator-specific files to put in these sub-directories in the future.

row_operators_common.cuh/hpp file will be in the parent directory, which in this case is: cudf/include/table/

I suggest this because the namespacing is super confusing right now. row::lexicographic::... but we have no directories called row or lexicographic

ttnghia commented 1 year ago

The namespace does not always align with directories. You can see many files like cudf/detail/something.hpp vs cudf/something/detail/something_detail.hpp which all have the same namespace hierarchy.

divyegala commented 1 year ago

That is a specific structure of the detail API that libcudf follows - I don't see where else cudf doesn't follow namespace <-> directory.

Alternatively, I would also be okay with row/lexicographic.cuh to follow the namespace row::lexicographic.

While I have been doing these updates to the experimental API, I always forget what the actual namespace of the comparators are because they don't make any sense when it comes to the directory/file naming.