lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
956 stars 219 forks source link

Implement conversion from CutSet to HuggingFace dataset #1398

Closed domklement closed 1 month ago

domklement commented 1 month ago

This PR implements a simple conversion from a CutSet containing MonoCuts and single-source Recording to a HuggingFace dataset.

CutSet.to_huggingface_dataset: None -> DataSetconverts the cutset into one of two formats, depending on whether all the cuts contain only one supervision or multiple of them. The formats are described in the method's docstring.

pzelasko commented 1 month ago

Thanks!!!