timsainb / noisereduce

Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
MIT License
1.43k stars 231 forks source link

Applying noise reduction on pyaudio stream #44

Open antondim opened 4 years ago

antondim commented 4 years ago

Hello, First of all, thank you for your great module.

I'm trying to apply real-time noise reduction on incoming audio stream

Settings on stream opening:

Settings on stream reading:

The problem that I'm facing is that a periodical "fan spinning" sound appears, after actively applying noise reduction (while loop), but this sound does not appear on a normal 5 second recording with noise reduction on the np.int16 array afterwards.

What is different is that in the first case (active-ish noise reduction), I append the sound data for each iteration,after noise reduction, whereas in the second case I record for 5 seconds and THEN apply the noise reduction on the whole set of data.

I'm uploading example wavs to give you a better perspective:

normal_case_wavs.zip active_case_wavs.zip

P.S I noticed that by changing the number of frames I read from the stream buffer, the frequency of this sound changes too. Could this be some kind of edge case where this sound indicates the change of "Sound CHUNK" I am processing (appending to the list for future .wav write) ?

Crucial part of code is here:

stream= p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=16000)

for i in range(0, int(16000 / 16000 * 5)):
    data = stream.read(16000)    
    sound_data_npint16 = np.hstack(np.fromstring(data, dtype=np.int16))
    noisy_frames.append(sound_data_npint16)

    sound_data_float = np.ndarray.astype(sound_data_npint16,float)/32768
    reduced_noise_float = nr.reduce_noise(audio_clip=sound_data_float, noise_clip=noise, verbose=False, n_fft=4096, n_std_thresh=1, pad_clipping=True) #Tried both pad_clipping=True/False
    reduced_noise_npint16 = np.ndarray.astype(np.iinfo(np.int16).max*reduced_noise_float,dtype=np.int16)

    denoised_active_frames.append(reduced_noise_npint16)

total_noisy_frames = np.hstack(noisy_frames) # noisy frames gathered
total_noisy_frames_float = np.ndarray.astype(total_noisy_frames,float)/32768
reduced_total_noisy_frames_float = nr.reduce_noise(audio_clip=total_noisy_frames_float, noise_clip=noise, verbose=False, n_fft=4096, n_std_thresh=1, pad_clipping=True)
reduced_total_noisy_frames_npint16 = np.ndarray.astype(np.iinfo(np.int16).max*reduced_total_noisy_frames_float,dtype=np.int16)

# noise wav comes from  "noisy_frames"
# actively denoised wav comes from "denoised_active_frames"
# denoised wav after 5seconds recording comes from "reduced_total_noisy_frames_npint16"
jonaslimads commented 3 years ago

Hello,

I did kind of a hack, but it seems to work:


    def __init__(self, output_to_file=True):
        self.vad = webrtcvad.Vad(int(self.vad_aggressiveness))
        self.deepspeech_model = self.load_deepspeech_model()
        self.noise_sample_data = self.load_noise_sample_data()

    def reduce_audio_noise(self, data: bytes) -> bytes:
        np_data = np.frombuffer(data, np.int16) / 1.0
        reduced_noise_data = reduce_noise(audio_clip=np_data, noise_clip=self.noise_sample_data)
        return reduced_noise_data.astype(np.int16).tobytes()

    def load_noise_sample_data(self) -> np.ndarray:
        path = os.path.join(os.path.dirname(__file__), "../../../assets/deepspeech/noise_sample.wav")
        with wave.open(path, "rb") as wf:
            frames = wf.getnframes()
            return np.frombuffer(wf.readframes(frames), np.int16) / 1.0

I would just stream the bytes return from self.reduce_audio_noise(_bytes) .

Of course, the noise sample is pretty limited because it only recognizes one pattern of noise.

I hope that can help.

yujie-tao commented 1 year ago

Hello! I am also facing a similar issue on "fan spinning" artifacts when trying to apply noisereduce on real-time microphone input. Curious if people have figured out a walkaround?

DanTremonti commented 1 year ago

@yujie-tao I faced the same issue and found a workaround to reduce this effect. By streaming audio as chunks with overlap, I was able to reduce this effect significantly. One of the side-effects to this is increase in latency.