How should I extract the features of noisy speech after mixing?

lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.

https://lhotse.readthedocs.io/en/latest/

Apache License 2.0

936 stars 214 forks source link

How should I extract the features of noisy speech after mixing? #1202

Open huhuqwaszxedc opened 11 months ago

huhuqwaszxedc commented 11 months ago

Hello, My Cutset was obtained by Cutset.mix, so all of them are Mixcut. I used compute_and_store_features_batch function, the features of the output Cutset only contain the features of the first track (i.e. the source audio). If I want to obtain the features of noisy speech after mixing, how should I extract them? Thank you very much for your work！

desh2608 commented 11 months ago

By default, it should already extract features for the "mixed" speech, not just the first track. The compute_and_store_features_batch calls load_audio internally which has mixed=True set by default (https://github.com/lhotse-speech/lhotse/blob/c5f26afd100885b86e4244eeb33ca1986f3fa923/lhotse/cut/mixed.py#L1027). If this is not the case for you, you may need to use pdb to step through the code and see where it is failing.