lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
904 stars 204 forks source link

Extending Lhotse dataloading to text/multimodal data #1295

Closed pzelasko closed 4 months ago

pzelasko commented 4 months ago

This PR adds a very basic support for incorporating text-only data into Lhotse samplers to enable text and multimodal dataloading. Highlights:

This is stretching the original scope of Lhotse a bit, but I feel like it's worth it: we accumulated a bunch of solid techniques here and it'd be a pity to have to use something completely different for multimodal modeling, especially when so little changes are required to make it work here. Would love to know your thoughts @danpovey @csukuangfj @desh2608 @m-wiesner

pzelasko commented 4 months ago

Note that this is an experimental feature: let us know if you're running into issues with this.