Closed gaohao95 closed 4 years ago
@jrhemstad can you take a look at this PR when you have time? mostly cuDF changes to update to the new libcudf++ API
@jrhemstad Thanks a lot for the review. I really appreciate these feedback.
Should this be a
const&
?
const
is added in a following commit https://github.com/rapidsai/distributed-join/pull/10/commits/9ce2080062df4b8d58f2524be4c601f321328e1a#diff-492eca7c4eff57c31bfa5be3fb67f18fR170
Do you have updated performance numbers after these changes?
Do you have updated performance numbers after these changes?
Not yet. We will test it after we figure out some communication layer configurations (https://github.com/rapidsai/distributed-join/pull/11).
https://github.com/rapidsai/cudf/pull/3748 is recently merged after 4d0976d, which is based on our initial PR. This may improve hash partition performance and align with my custom branch.
I tested with today's master ac143f10236b32842bfac7709a74477c2f8d0df6
but got similar timings, a little faster than before rapidsai/cudf#3748 got merged, but still slower than branch-0.8
:
rank 0 gpu list 0,1,2,3,4,5,6,7 cpu bind 1-4 ndev mlx5_0:1
rank 1 gpu list 0,1,2,3,4,5,6,7 cpu bind 5-8 ndev mlx5_0:1
rank 2 gpu list 0,1,2,3,4,5,6,7 cpu bind 10-13 ndev mlx5_1:1
rank 3 gpu list 0,1,2,3,4,5,6,7 cpu bind 15-18 ndev mlx5_1:1
rank 4 gpu list 0,1,2,3,4,5,6,7 cpu bind 21-24 ndev mlx5_2:1
rank 5 gpu list 0,1,2,3,4,5,6,7 cpu bind 25-28 ndev mlx5_2:1
rank 6 gpu list 0,1,2,3,4,5,6,7 cpu bind 30-33 ndev mlx5_3:1
rank 7 gpu list 0,1,2,3,4,5,6,7 cpu bind 35-38 ndev mlx5_3:1
Device count: 8
Rank 0 select 0/8 GPU
Device count: 8
Rank 6 select 6/8 GPU
Device count: 8
Rank 1 select 1/8 GPU
Device count: 8
Rank 5 select 5/8 GPU
Device count: 8
Rank 4 select 4/8 GPU
Device count: 8
Rank 7 select 7/8 GPU
Device count: 8
Rank 2 select 2/8 GPU
Device count: 8
Rank 3 select 3/8 GPU
Elasped time (s) 0.607329
It's not too surprising that things got a bit slower. There was some changes necessary in the join algorithm to add string support which may have slowed things down, but we'd need to see profiles to know exactly what the root cause is.
Merging since we can get as good or even better performance than last year's number on legacy cuDF.
This PR intends to implement #6.