Closed njellinas closed 3 months ago
Is this on a mini-batch? In that case you can run cuts.mix(...).to_eager().load_audio()
. We should change that exception message to be more informative in such cases.
If it's not on a mini-batch, then iterate cuts one by one and call .load_audio()
on each, or use collate_audio(cuts)
function.
Yes, after I posted the issue I found the to_eager() operation which helped. Overall, I think the transformation logic applied on some dataset classes like the K2SpeechRecognitionDataset is fine, but if you want to work with raw audio or segments the things get kind of hard, because I think the whole logic is made for extracting features, but many vocoders/codecs etc. now require raw audio.
You can give AudioSamples strategy to k2 dataset to get raw audio tensors.
I have downloaded MUSAN dataset and I want to augment the audio files I have in a CutSet so I run:
but when I perform cuts.load_audio() I get the following:
Can you help me do the augmentation on the wavs? The documentation is not helpful at all I cannot find anything related to direct cut transformations.