lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
904 stars 204 forks source link

How to split long cuts into shorter ones without messing supervisions up? #1286

Closed mohsen-goodarzi closed 4 months ago

mohsen-goodarzi commented 4 months ago

I have a dataset that contains long audio files (around 1 hour). I also have the word transcription with time alignments and I treated every word as a supervision segment. I want to cut utterances into shorter segment of a specific length (like 5 sec). I don't care if the resulting cuts are not exactly 5 sec, as long as the cut position is not in the middle of a word. How can I do it with Lhotse?

If I do the cuts.cut_into_windows(5, keep_excessive_supervisions=True), then some supervision segments duplicate on both adjacent cuts. If I do the cuts.cut_into_windows(5, keep_excessive_supervisions=False), then those supervision segments will be lost! The only solution I came up with is to do the first option and then loop over cuts and extend their duration to their supervision and also filter out supervisions with negative start. Is it the best solution? Is there a built in method to extend the cut to cover its supervisions?

Any kind of help is appreciated.

pzelasko commented 4 months ago

Look into the following CutSet methods: trim_to_supervisions, trim_to_supervision_groups, trim_to_alignments

For extending the cut there is extend, on a related note there is also CutSet.merge_supervisions

See if any of these help with your case

mohsen-goodarzi commented 4 months ago

Thanks for fast reply.

This is the way I did it:

desired_cut_len = 5
tmp_cuts = long_cuts.cut_into_windows(desired_cut_len).merge_supervisions().filter_supervisions(lambda s: s.start >= 0.0).trim_to_supervisions()
short_cuts = []
for cut in tmp_cuts :
    if cut.duration < cut.supervisions[0].duration:
        cut.duration = cut.supervisions[0].duration
    short_cuts.append(cut)

There is a extend_by method in CutSet, but it didn't help me because in my case (i.e. extending cuts to their supervisions) the amount of extension is different for each single cut.

Anyway, the above snippet did the job for me.