lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
908 stars 205 forks source link

save sdm files into a single mdm file to do gss #1221

Closed yuekaizhang closed 7 months ago

desh2608 commented 7 months ago

I see what you're trying to get at, but I think there may be a more elegant solution which does not need running sox commands during data preparation. You can create a Recording object which has multiple AudioSources, possibly 1 per microphone channel. Then, if you call load_audio with the specified channels, it would load only those channels. This is better in 2 ways:

  1. data is not duplicated
  2. you can load any combination of channels from the recording

As an example, I would suggest looking at the mdm preparation in ICSI: https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/icsi.py

yuekaizhang commented 7 months ago

I see what you're trying to get at, but I think there may be a more elegant solution which does not need running sox commands during data preparation. You can create a Recording object which has multiple AudioSources, possibly 1 per microphone channel. Then, if you call load_audio with the specified channels, it would load only those channels. This is better in 2 ways:

  1. data is not duplicated
  2. you can load any combination of channels from the recording

As an example, I would suggest looking at the mdm preparation in ICSI: https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/icsi.py

Done, thanks.