xorbitsai / xoscar

Python actor framework for heterogeneous computing.
https://xoscar.dev
Apache License 2.0
91 stars 21 forks source link

FEAT: Support nccl in collective communication #52

Closed RandomY-2 closed 1 year ago

RandomY-2 commented 1 year ago

What do these changes do?

Add GPU collective communication using cupy NcclCommunicator (similar to alpa: https://github.com/alpa-projects/alpa/blob/main/alpa/collective/collective_group/nccl_collective_group.py but tried to implement missing apis). Will implement a process group class once #46 is finished.

test cases are similar to pygloo test cases: https://github.com/xorbitsai/xoscar/blob/main/python/xoscar/collective/tests/test_pygloo.py

Related issue number

Related #22

Check code requirements

codecov[bot] commented 1 year ago

Codecov Report

Merging #52 (9d44325) into main (c329d28) will decrease coverage by 23.81%. The diff coverage is 27.83%.

@@             Coverage Diff             @@
##             main      #52       +/-   ##
===========================================
- Coverage   93.00%   69.20%   -23.81%     
===========================================
  Files          42       43        +1     
  Lines        3361     3455       +94     
  Branches      376      364       -12     
===========================================
- Hits         3126     2391      -735     
- Misses        155      948      +793     
- Partials       80      116       +36     
Flag Coverage Δ
unittests 69.20% <27.83%> (-23.75%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
python/xoscar/collective/xoscar_cupy.py 27.83% <27.83%> (ø)

... and 29 files with indirect coverage changes

ChengjieLi28 commented 1 year ago

Thanks to @RandomY-2 for his contribution. Work on the current PR has been moved to #56 , so this PR is closed for now.