xorbitsai / xoscar

Python actor framework for heterogeneous computing.
https://xoscar.dev
Apache License 2.0
89 stars 21 forks source link

FEAT:impl nccl interface for collective communication #56

Closed YibinLiu666 closed 1 year ago

YibinLiu666 commented 1 year ago

What do these changes do?

Related issue number

Fixes #xxxx

Check code requirements

codecov[bot] commented 1 year ago

Codecov Report

Merging #56 (a5ff6c1) into main (0648914) will decrease coverage by 5.25%. The diff coverage is 27.18%.

@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
- Coverage   93.96%   88.72%   -5.25%     
==========================================
  Files          46       47       +1     
  Lines        3647     3929     +282     
  Branches      705      757      +52     
==========================================
+ Hits         3427     3486      +59     
- Misses        145      357     +212     
- Partials       75       86      +11     
Flag Coverage Δ
unittests 88.54% <26.51%> (-5.29%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
python/xoscar/collective/process_group.py 57.14% <13.23%> (-39.57%) :arrow_down:
python/xoscar/collective/backend/nccl_backend.py 25.55% <25.55%> (ø)
python/xoscar/collective/core.py 78.20% <50.00%> (-17.38%) :arrow_down:
python/xoscar/collective/utils.py 58.82% <50.00%> (-4.82%) :arrow_down:
python/xoscar/collective/common.py 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

ChengjieLi28 commented 1 year ago

By the way, could you please add some docstring for the top-level interface like init_process_group, new_group and allreduce ,etc for both gloo and nccl ? Refer to pytorch's doc. https://pytorch.org/docs/stable/_modules/torch/distributed/distributed_c10d.html#init_process_group