Imported 8khz training audio compromised by unfiltered upsampling

mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Mozilla Public License 2.0

25.14k stars 3.95k forks source link

Imported 8khz training audio compromised by unfiltered upsampling #1726

Closed khsinclair closed 5 years ago

khsinclair commented 5 years ago

The audio for the Fisher training corpus, and possibly Switchboard as well, is originally 8khz sample rate with ulaw encoding. The import_fisher.py script converts it to 16khz sample rate PCM for training. (I'm not sure about import_swb)

The problem is that upsampling is done with a python audioop.ratecv() primitive that does no filtering at all, leaving the high band from 4khz-8khz with an image of the voice band. I'll attach spectrograms to illustrate.

Properly bandlimited upsampling is not as simple as it sounds. Julius Smith has a good explanation and lists a number of good implementations here: https://ccrma.stanford.edu/~jos/resample/resample.html

The function _split_and_resample_wav in import_fisher.py should use a better upsampler. I think the sox utility, as used in the deepspeech python client, does it right by default.

khsinclair commented 5 years ago

Attached is an Audacity screenshot with two spectrograms of the same phrase, originally recorded with 8khz sample rate. The top spectrogram is upsampled to 16khz using audioop.ratecv() exactly as import_fisher.py uses it. You can see the frequency content from 4k-8k is a reflected image of the lower band. The bottom spectrogram was upsampled to 16khz using the sox command line, and the high band has been properly suppressed. 2018-11-15_06h42_26

khsinclair commented 5 years ago

And here's the spectrum of the first vowel in the phrase, the shaded region in the top spectrogram. There's probably only a few mel filterbanks in the high band, but that's a significant source of noise in the training set. 2018-11-15_06h41_11