rapidsai / distributed-join

Other
19 stars 12 forks source link

Update codebase to new cuDF API #10

Closed gaohao95 closed 4 years ago

gaohao95 commented 4 years ago

This PR intends to implement #6.

nsakharnykh commented 4 years ago

@jrhemstad can you take a look at this PR when you have time? mostly cuDF changes to update to the new libcudf++ API

gaohao95 commented 4 years ago

@jrhemstad Thanks a lot for the review. I really appreciate these feedback.

gaohao95 commented 4 years ago

Should this be a const&?

const is added in a following commit https://github.com/rapidsai/distributed-join/pull/10/commits/9ce2080062df4b8d58f2524be4c601f321328e1a#diff-492eca7c4eff57c31bfa5be3fb67f18fR170

jrhemstad commented 4 years ago

Do you have updated performance numbers after these changes?

gaohao95 commented 4 years ago

Do you have updated performance numbers after these changes?

Not yet. We will test it after we figure out some communication layer configurations (https://github.com/rapidsai/distributed-join/pull/11).

gaohao95 commented 4 years ago

https://github.com/rapidsai/cudf/pull/3748 is recently merged after 4d0976d, which is based on our initial PR. This may improve hash partition performance and align with my custom branch.

nsakharnykh commented 4 years ago

I tested with today's master ac143f10236b32842bfac7709a74477c2f8d0df6 but got similar timings, a little faster than before rapidsai/cudf#3748 got merged, but still slower than branch-0.8:

rank 0 gpu list 0,1,2,3,4,5,6,7 cpu bind 1-4 ndev mlx5_0:1
rank 1 gpu list 0,1,2,3,4,5,6,7 cpu bind 5-8 ndev mlx5_0:1
rank 2 gpu list 0,1,2,3,4,5,6,7 cpu bind 10-13 ndev mlx5_1:1
rank 3 gpu list 0,1,2,3,4,5,6,7 cpu bind 15-18 ndev mlx5_1:1
rank 4 gpu list 0,1,2,3,4,5,6,7 cpu bind 21-24 ndev mlx5_2:1
rank 5 gpu list 0,1,2,3,4,5,6,7 cpu bind 25-28 ndev mlx5_2:1
rank 6 gpu list 0,1,2,3,4,5,6,7 cpu bind 30-33 ndev mlx5_3:1
rank 7 gpu list 0,1,2,3,4,5,6,7 cpu bind 35-38 ndev mlx5_3:1
Device count: 8
Rank 0 select 0/8 GPU
Device count: 8
Rank 6 select 6/8 GPU
Device count: 8
Rank 1 select 1/8 GPU
Device count: 8
Rank 5 select 5/8 GPU
Device count: 8
Rank 4 select 4/8 GPU
Device count: 8
Rank 7 select 7/8 GPU
Device count: 8
Rank 2 select 2/8 GPU
Device count: 8
Rank 3 select 3/8 GPU
Elasped time (s) 0.607329
jrhemstad commented 4 years ago

It's not too surprising that things got a bit slower. There was some changes necessary in the join algorithm to add string support which may have slowed things down, but we'd need to see profiles to know exactly what the root cause is.

gaohao95 commented 4 years ago

Merging since we can get as good or even better performance than last year's number on legacy cuDF.