noisyspeech_synthesizer.py always slices from the start of the noise array

In noisyspeech_synthesizer.py, an array of audio samples are read from a noise file (line 78). On line 81, a slice of the noise array is taken from index 0 to len(clean) as:

noise = noise[0:len(clean)]

By always starting at index 0, in the case where the clean speech arrays are roughly the same length (~16000 samples) as in the speech commands case, it means that the number of unique noise arrays we see is equal to the number of noise files.

Even if we have one noise file with 10 hours of audio, we may only ever make use of the first 1 second of this data.

It would be better to pick a random starting index within the noise array from which to take a slice. For example start_idx = np.random.randint(low=0, high=len(noise)-len(clean), size=1) noise = noise[start_idx : start_idx+len(clean)]

microsoft / MS-SNSD

noisyspeech_synthesizer.py always slices from the start of the noise array #25