lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
904 stars 204 forks source link

Question about the returns of CutSet.mix() ? #1267

Closed kobenaxie closed 5 months ago

kobenaxie commented 5 months ago

CutSet.mix() returns only MixedCut when p=1, but returns both MonoCut and MixedCut when p <1.0;

from lhotse import Recording, CutSet

 wav_file = "a.wav"
 noise_file = "b.wav"

 wav_cut = Recording.from_file(wav_file).to_cut()
 noise_cut = Recording.from_file(noise_file).to_cut()

 wav_cuts = CutSet.from_cuts([wav_cut])
 noise_cuts = CutSet.from_cuts([noise_cut])

 for p in [0.1, 1.0]:
     mixed_cuts = wav_cuts.mix(
         cuts=noise_cuts,
         duration=None,
         snr=10,
         mix_prob=p,
         preserve_id=None,
         seed=42,
         random_mix_offset=True,
     ).to_eager()

     print(f"Mixed cuts lengths: {len(mixed_cuts)}")

outputs

Mixed cuts lengths: 2
Mixed cuts lengths: 1

Hi, @pzelasko Is it a bug here ? As what i want is only the mixed cut MixedCut when training ASR model. And when training ASR model with the lhotse.dataset.CutMix transform, the time costs 2x compared to the training without lhotse.dataset.CutMix transform.

pzelasko commented 5 months ago

Thank you, it's indeed an issue introduced recently in https://github.com/lhotse-speech/lhotse/pull/1244

I created a fix in #1268