Open alex-hh opened 1 month ago
Following up on discussion in #6623 and #7198 I thought this would be pretty useful for my case so had a go at implementing.
My main motivation is to be able to call iterable_dataset.repeat(None).take(samples_per_epoch) to safely avoid timeout issues in a distributed training setting. This would provide a straightforward workaround for several open issues related to this situation: https://github.com/huggingface/datasets/issues/6437, https://github.com/huggingface/datasets/issues/6594, https://github.com/huggingface/datasets/issues/6623, https://github.com/huggingface/datasets/issues/6719.
@lhoestq let me know if this looks on the right track!
Following up on discussion in #6623 and #7198 I thought this would be pretty useful for my case so had a go at implementing.
My main motivation is to be able to call iterable_dataset.repeat(None).take(samples_per_epoch) to safely avoid timeout issues in a distributed training setting. This would provide a straightforward workaround for several open issues related to this situation: https://github.com/huggingface/datasets/issues/6437, https://github.com/huggingface/datasets/issues/6594, https://github.com/huggingface/datasets/issues/6623, https://github.com/huggingface/datasets/issues/6719.
@lhoestq let me know if this looks on the right track!