Open nerv3890 opened 5 years ago
Firstly, thanks for the author of this project What an amazing work!
I have trained the model from scratch using my own data, the denoise result is great (data: 1.5hr speech 1hr noise, 120 epochs) Then I try to train a deVoice (human voice) model using the same architecture and parameters (data: 1.5 noise 1hr speech, 50 epochs, because the loss converges after 40 epochs) I think that the result of deVoice should as good as the result of denoise theoretically But the result of deVoice is not good (the loss converges to 0.4)
Does anybody have any idea of how to improve the performance of deVoice? Do I need to change the model architecture or some parameters? (like the feature length or something)
I appreciate.
I trained the model as well but the result was not good, could you pls tell me what‘s kind of works did you do with your dataset ?I just input my noise.raw and signal.raw to denoise_train to get the features !
U mean the model of denoise or deVoice (human vocal)?
U mean the model of denoise or deVoice (human vocal)?
Denoise ,and I just find out that I did a mistake will merging the signals raw,now I‘m trying to train with the right one. Thank you !
Firstly, thanks for the author of this project What an amazing work!
I have trained the model from scratch using my own data, the denoise result is great (data: 1.5hr speech 1hr noise, 120 epochs) Then I try to train a deVoice (human voice) model using the same architecture and parameters (data: 1.5 noise 1hr speech, 50 epochs, because the loss converges after 40 epochs) I think that the result of deVoice should as good as the result of denoise theoretically But the result of deVoice is not good (the loss converges to 0.4)
Does anybody have any idea of how to improve the performance of deVoice? Do I need to change the model architecture or some parameters? (like the feature length or something)
I appreciate.
I am trying to train a model from scratch and i have taken the same amount of dataset as you mentioned for denoise, bu my loss is not decreasing beyond 0.68. can you please help me with this
According to paper https://arxiv.org/pdf/1709.08243.pdf , the architecture seems to be specialized for voice denoising. So, I think the result is not strange. VAD and pitch filter are voice specialized feature. I think the architecture utilizes the domain knowledge that the voice tends to have silent time and active time and consist of tonal tones.
It may be possible to learn pitch filter coefficients. Optimal gain and filter coefficients can be calculated in train phase as following.
argmin{g, α, T} |C - (X + P α) g|
g: gain of a band α: filter coefficients of a band X: Noisy fft bins of a band P: T Delayed Noisy fft bins of a band C: Clean fft bins of a band
maybe, this modification makes possible to use rnnoise for devoice.
hey @nerv3890 how had you trained the model when i run ~/rnnoise/src/denoise_training speech_only.pcm noise_only.pcm output.f32 i get an error that No such file or directory what was your way to train and dump the model
Firstly, thanks for the author of this project What an amazing work!
I have trained the model from scratch using my own data, the denoise result is great (data: 1.5hr speech 1hr noise, 120 epochs) Then I try to train a deVoice (human voice) model using the same architecture and parameters (data: 1.5 noise 1hr speech, 50 epochs, because the loss converges after 40 epochs) I think that the result of deVoice should as good as the result of denoise theoretically But the result of deVoice is not good (the loss converges to 0.4)
Does anybody have any idea of how to improve the performance of deVoice? Do I need to change the model architecture or some parameters? (like the feature length or something)
I appreciate.