pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.08k stars 142 forks source link

Enable Append Mode in SaverIterDataPipe #1270

Open rravu3 opened 3 weeks ago

rravu3 commented 3 weeks ago

🚀 The feature

Currently, Saver only allows write mode and only users to choose byte vs text mode. It might be useful to allow the flexibility to append to an existing file.

Motivation, pitch

Allowing the flexibility to append could be useful for building data pipes that can enable large scale transformations on original dataset like tokenizing, randomizing and/or splitting across training and evaluation.

Alternatives

No response

Additional context

No response

andrewkho commented 3 weeks ago

Thanks for the feature request @rravu3 , unfortunately we will be deprecating and then deleting DataPipes/DataLoaderV2, please see this issue: Future of torchdata and dataloading