rwth-i6 / returnn

The RWTH extensible training framework for universal recurrent neural networks
http://returnn.readthedocs.io/
Other
350 stars 131 forks source link

ConcatSeqsDataset with extended functionality #1573

Open Stefanwuu opened 5 days ago

Stefanwuu commented 5 days ago

Since it turns out helpful for some datasets to concatenate sequences during the training, some new functionality might be desired for future training usage.

albertz commented 5 days ago

A couple of related thoughts (but but more) are in #292. (This is for TF, but most of the discussion is generic and can be applied in the same way to PT, or even be backend independent.)

In case of PT, I think an easy way right now is to implement that as another IterDataPipe, e.g. like ChunkingIterDataPipe. Currently in our PT Engine._create_data_loader, it is somewhat hardcoded what pipe transformations we construct, but we could make this more configurable/flexible, that the user could put arbitrary own things in between.

Note that there are specific assumptions on what kind of data flows through the pipe (e.g. how seq lens are stored), and this is currently not well defined, and we planned to replace that by the well defined TensorDict at some point (#1302). So if you depend on the current behavior, then this might break in the future.

Another approach would be to just extend ConcatSeqsDataset, or implement another more flexible dataset, and then the logic applies on the dataset, independent from the backend. I personally would maybe leave ConcatSeqsDataset untouched and make some generic DynConcatSeqsDataset or so where the user gives a subdataset (just like for ConcatSeqsDataset) and then some generic function which decided what sequences to concatenate, and another function where the user can do the concatenation in whatever way he/she wants.

Stefanwuu commented 5 days ago

Another approach would be to just extend ConcatSeqsDataset, or implement another more flexible dataset, and then the logic applies on the dataset, independent from the backend. I personally would maybe leave ConcatSeqsDataset untouched and make some generic DynConcatSeqsDataset or so where the user gives a subdataset (just like for ConcatSeqsDataset) and then some generic function which decided what sequences to concatenate, and another function where the user can do the concatenation in whatever way he/she wants.

I personally also prefer the idea of a 'DynConcatSeqsDataset', maybe I can do sth about this.

Independent of that, here is a link to my implementation for forced alignment restriction that allows training with raw wav alignments. #1574