Open rouseabout opened 1 month ago
I have addressed everything except for restore_orig_sr=True
. I am not sure how to achieve that!
I have addressed everything except for
restore_orig_sr=True
. I am not sure how to achieve that!
You are very close! Add a parameter restore_orig_sr=True
in def narrowband(self, ...)
for cut and recording, and pass the provided argument to Narrowband
constructor. Then you can extend the condition for the second resampling to if self.restore_orig_sr and sampling_rate != 8000)
.
Done, but something extra is needed, because when I apply the transformation with use_orig_sr=False
the following exception occurs:
AudioLoadingError: The number of declared samples in the recording diverged from the one obtained when loading audio (offset=0, duration=19.22419501133787). This could be internal Lhotse's error or a faulty transform implementation. Please report this issue in Lhotse and show the following: diff=693887, audio.shape=(1, 153900), recording=Recording(id='0_nb_lpc10', sources=[AudioSource(type='file', channels=[0], source='/home/user/workspace/rtvalid/0.wav')], sampling_rate=44100, num_samples=847787, duration=19.22419501133787, channel_ids=[0], transforms=[{'name': 'Narrowband', 'kwargs': {'codec': 'lpc10', 'restore_orig_sr': False}}])
If you don't restore orig sr, you'll have to update both sampling_rate and num_samples property on the Recording object.
This patch adds a audio codec transformation.
I have found that when applying K2 ASR to speech compressed with mulaw, it is advantageous to augment the training data with these codecs. The transformation resamples the input audio to 8kHz, encodes then decodes using specified codec, then restores the original sample rate (e.g. 16 kHz).
Open issues:
phone()
. But maybe a better name is needed?Example use:
libspandsp is required to use the lpc10 codec. Use apt-get install libspandsp-dev on Debian/Ubuntu.