Will be useful to distribute a dataset across workers (other than pytorch) like spark
I also renamed .n_shards -> .num_shards for consistency and kept the old name for backward compatibility. And a few changes in internal functions for consistency as well (rank, world_size -> num_shards, index)
Breaking change: the new default for contiguous in Dataset.shard() is True, but imo not a big deal since I couldn't find any usage of contiguous=False internally (we always do contiguous=True for map-style datasets since its more optimized) or in the wild
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Will be useful to distribute a dataset across workers (other than pytorch) like spark
I also renamed
.n_shards
->.num_shards
for consistency and kept the old name for backward compatibility. And a few changes in internal functions for consistency as well (rank, world_size -> num_shards, index)Breaking change: the new default for
contiguous
inDataset.shard()
isTrue
, but imo not a big deal since I couldn't find any usage ofcontiguous=False
internally (we always do contiguous=True for map-style datasets since its more optimized) or in the wild