Open Stefanwuu opened 5 days ago
A couple of related thoughts (but but more) are in #292. (This is for TF, but most of the discussion is generic and can be applied in the same way to PT, or even be backend independent.)
In case of PT, I think an easy way right now is to implement that as another IterDataPipe
, e.g. like ChunkingIterDataPipe
. Currently in our PT Engine._create_data_loader
, it is somewhat hardcoded what pipe transformations we construct, but we could make this more configurable/flexible, that the user could put arbitrary own things in between.
Note that there are specific assumptions on what kind of data flows through the pipe (e.g. how seq lens are stored), and this is currently not well defined, and we planned to replace that by the well defined TensorDict
at some point (#1302). So if you depend on the current behavior, then this might break in the future.
Another approach would be to just extend ConcatSeqsDataset
, or implement another more flexible dataset, and then the logic applies on the dataset, independent from the backend. I personally would maybe leave ConcatSeqsDataset
untouched and make some generic DynConcatSeqsDataset
or so where the user gives a subdataset (just like for ConcatSeqsDataset
) and then some generic function which decided what sequences to concatenate, and another function where the user can do the concatenation in whatever way he/she wants.
Another approach would be to just extend
ConcatSeqsDataset
, or implement another more flexible dataset, and then the logic applies on the dataset, independent from the backend. I personally would maybe leaveConcatSeqsDataset
untouched and make some genericDynConcatSeqsDataset
or so where the user gives a subdataset (just like forConcatSeqsDataset
) and then some generic function which decided what sequences to concatenate, and another function where the user can do the concatenation in whatever way he/she wants.
I personally also prefer the idea of a 'DynConcatSeqsDataset', maybe I can do sth about this.
Independent of that, here is a link to my implementation for forced alignment restriction that allows training with raw wav alignments. #1574
Since it turns out helpful for some datasets to concatenate sequences during the training, some new functionality might be desired for future training usage.