lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
926 stars 212 forks source link

MUSAN mix to current CutSet: Cannot load audio of cuts in a lazy CutSet. #1376

Closed njellinas closed 1 month ago

njellinas commented 1 month ago

I have downloaded MUSAN dataset and I want to augment the audio files I have in a CutSet so I run:

cuts.mix(cuts=self.musan_cuts, snr=[10, 20], mix_prob=0.5, preserve_id=True)

but when I perform cuts.load_audio() I get the following:

Cannot load audio of cuts in a lazy CutSet.

Can you help me do the augmentation on the wavs? The documentation is not helpful at all I cannot find anything related to direct cut transformations.

pzelasko commented 1 month ago

Is this on a mini-batch? In that case you can run cuts.mix(...).to_eager().load_audio(). We should change that exception message to be more informative in such cases.

If it's not on a mini-batch, then iterate cuts one by one and call .load_audio() on each, or use collate_audio(cuts) function.

njellinas commented 1 month ago

Yes, after I posted the issue I found the to_eager() operation which helped. Overall, I think the transformation logic applied on some dataset classes like the K2SpeechRecognitionDataset is fine, but if you want to work with raw audio or segments the things get kind of hard, because I think the whole logic is made for extracting features, but many vocoders/codecs etc. now require raw audio.

pzelasko commented 1 month ago

You can give AudioSamples strategy to k2 dataset to get raw audio tensors.