pytorch / torcheval

A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools for PyTorch model evaluations.
https://pytorch.org/torcheval
Other
211 stars 46 forks source link

support `rank` argument in sync_states #181

Closed JKSenthil closed 11 months ago

JKSenthil commented 11 months ago

Summary:

Context

TorchEval used to support syncing states on specific ranks. When rewriting the sync logic, we removed this feature to maintain simplicity. We want to add this feature back for checkpointing flexibility purposes

This Diff

Added support for rank argument for methods in synclib. Next (and final) diff will add support for rank argument in toolkit (so for methods like sync_and_compute, etc that act on the metric object)

Also refactored few lines here and there for readability

Reviewed By: galrotem

Differential Revision: D49039113

facebook-github-bot commented 11 months ago

This pull request was exported from Phabricator. Differential Revision: D49039113

codecov[bot] commented 11 months ago

Codecov Report

Merging #181 (c24a722) into main (0b88a7d) will decrease coverage by 0.07%. The diff coverage is 6.20%.

@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
- Coverage   24.72%   24.66%   -0.07%     
==========================================
  Files         178      178              
  Lines       10252    10286      +34     
==========================================
+ Hits         2535     2537       +2     
- Misses       7717     7749      +32     
Files Changed Coverage Δ
torcheval/metrics/synclib.py 14.47% <3.22%> (-0.11%) :arrow_down:
tests/metrics/test_synclib.py 15.47% <6.25%> (-1.55%) :arrow_down:
torcheval/metrics/toolkit.py 29.23% <50.00%> (+0.32%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more