lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
904 stars 204 forks source link

Function to merge short sentences into a long sentence #1293

Open yuguochencuc opened 4 months ago

yuguochencuc commented 4 months ago

Hello, I would like to ask if there is a function in lhotse that supports the operation of splicing multiple uniform speakers or id-like sentence in a CutSet into one long sentence? I have extracted jsonl.gz and I want to merge the sentences of some speakers with similar ids into one long sentence.

pzelasko commented 4 months ago

You can do that by composing some groupby's and append, e.g.

from lhotse.cut.set import CutSet, append_cuts

cuts = CutSet.from_file(...)
speakers = cuts.speakers

long_cuts_per_speaker = {s: append_cuts(cuts.filter(lambda c: c.supervisions[0].speaker == s)) for s in speakers}  # there are more efficient ways of doing this

Of course you can also add silence in between with cut.pad(cut.duration + pause_duration).append(another_cut), mix noises in with .mix(noise_cut), etc.

yuguochencuc commented 4 months ago

Thank you very much for your reply! May I ask if I can control the length of each sentence? For example, if I need to merge several short sentence into 10-20S, will I follow cut.duration to merge.

---- Replied Message ---- From Piotr @.> Date 02/29/2024 23:54 To lhotse-speech/lhotse @.> Cc Guochen Yu @.>, Author @.> Subject Re: [lhotse-speech/lhotse] Function to merge short sentences into a long sentence (Issue #1293)

You can do that by composing some groupby's and append, e.g. from lhotse.cut.set import CutSet, append_cuts

cuts = CutSet.from_file(...) speakers = cuts.speakers

long_cuts_per_speaker = {s: append_cuts(cuts.filter(lambda c: c.supervisions[0].speaker == s)) for s in speakers} # there are more efficient ways of doing this Of course you can also add silence in between with cut.pad(cut.duration + pause_duration).append(another_cut), mix noises in with .mix(noise_cut), etc. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

pzelasko commented 4 months ago

Yes, you can also search for CutConcatenate transform that now has support for max_duration argument.