xiph / rnnoise

Recurrent neural network for audio noise reduction
BSD 3-Clause "New" or "Revised" License
3.98k stars 890 forks source link

The result of deVoice using the same archtitechture #57

Open nerv3890 opened 5 years ago

nerv3890 commented 5 years ago

Firstly, thanks for the author of this project What an amazing work!

I have trained the model from scratch using my own data, the denoise result is great (data: 1.5hr speech 1hr noise, 120 epochs) Then I try to train a deVoice (human voice) model using the same architecture and parameters (data: 1.5 noise 1hr speech, 50 epochs, because the loss converges after 40 epochs) I think that the result of deVoice should as good as the result of denoise theoretically But the result of deVoice is not good (the loss converges to 0.4)

Does anybody have any idea of how to improve the performance of deVoice? Do I need to change the model architecture or some parameters? (like the feature length or something)

I appreciate.

carcloudfly commented 5 years ago

Firstly, thanks for the author of this project What an amazing work!

I have trained the model from scratch using my own data, the denoise result is great (data: 1.5hr speech 1hr noise, 120 epochs) Then I try to train a deVoice (human voice) model using the same architecture and parameters (data: 1.5 noise 1hr speech, 50 epochs, because the loss converges after 40 epochs) I think that the result of deVoice should as good as the result of denoise theoretically But the result of deVoice is not good (the loss converges to 0.4)

Does anybody have any idea of how to improve the performance of deVoice? Do I need to change the model architecture or some parameters? (like the feature length or something)

I appreciate.

I trained the model as well but the result was not good, could you pls tell me what‘s kind of works did you do with your dataset ?I just input my noise.raw and signal.raw to denoise_train to get the features !

nerv3890 commented 5 years ago

U mean the model of denoise or deVoice (human vocal)?

carcloudfly commented 5 years ago

U mean the model of denoise or deVoice (human vocal)?

Denoise ,and I just find out that I did a mistake will merging the signals raw,now I‘m trying to train with the right one. Thank you !

venkat-kittu commented 5 years ago

Firstly, thanks for the author of this project What an amazing work!

I have trained the model from scratch using my own data, the denoise result is great (data: 1.5hr speech 1hr noise, 120 epochs) Then I try to train a deVoice (human voice) model using the same architecture and parameters (data: 1.5 noise 1hr speech, 50 epochs, because the loss converges after 40 epochs) I think that the result of deVoice should as good as the result of denoise theoretically But the result of deVoice is not good (the loss converges to 0.4)

Does anybody have any idea of how to improve the performance of deVoice? Do I need to change the model architecture or some parameters? (like the feature length or something)

I appreciate.

I am trying to train a model from scratch and i have taken the same amount of dataset as you mentioned for denoise, bu my loss is not decreasing beyond 0.68. can you please help me with this

contribu commented 5 years ago

According to paper https://arxiv.org/pdf/1709.08243.pdf , the architecture seems to be specialized for voice denoising. So, I think the result is not strange. VAD and pitch filter are voice specialized feature. I think the architecture utilizes the domain knowledge that the voice tends to have silent time and active time and consist of tonal tones.

contribu commented 5 years ago

extension to use rnnoise for devoice (not tested)

It may be possible to learn pitch filter coefficients. Optimal gain and filter coefficients can be calculated in train phase as following.

argmin{g, α, T} |C - (X + P α) g|

g: gain of a band α: filter coefficients of a band X: Noisy fft bins of a band P: T Delayed Noisy fft bins of a band C: Clean fft bins of a band

maybe, this modification makes possible to use rnnoise for devoice.

pranshurastogi29 commented 4 years ago

hey @nerv3890 how had you trained the model when i run ~/rnnoise/src/denoise_training speech_only.pcm noise_only.pcm output.f32 i get an error that No such file or directory what was your way to train and dump the model