Closed jzi040941 closed 3 years ago
Hi Noah,
Thanks for open this page for discussion! I've been working on similar project (PercepNet on AEC) recently, and I met some troubles regarding pitch coherence calculation, so I was wondering if I did anything wrong in this part.
Firstly, I used the comb filter on clean speech x[n] and I got p-hat[n], then I calculated pitch coherence:
Secondly, since this paper says p is not available, I used p-hat to calculate pitch coherence of x, y and p-hat. And there comes a problem, the calculated comb-filter ratio, r, does not range from 0~1 based on the calculated coherence, and some times even alpha is Nan due to negative value inside square root:
So I have several questions regarding to this part and need some idea:
Thank you!
Hi, Oscar
This is my answer on my opinion
If you have nan problem one my opinion is clipping your data. sometimes pitch coherence value goes minus my solution was making it 0 when It is under 0
What I write is based on my understanding. It could be wrong so if anyone have intuition about this feel free to leave comment Thanks!
Thanks Noah! But I still got an Nan problem on alpha based on your suggestion, so I took absolute value on pitch coherence to solve this problem.
However, the abnormal comb-filter strength remained, some values I got didn't range from 0~1. Did you have this problem when you clipped the pitch coherence? I don't know if I should clip comb-filter strength as well.
Hi Noah,
Thanks for sharing your code!
I'm now running your latest data generation code. Observing the result by using two sample data, I have a question about the data generation step in your code. The calculated gains of the first five frames are always 0, I think it is made by the 'look forward' in the origin paper. I think the first calculated gains and filter strength is the -5th frames label when you input the 0th frame. I think training the model may need to shift the output of gain and filter strength 5frame ahead. Could you please tell me if I had make something wrong, thanks!
Thanks Noah! But I still got an Nan problem on alpha based on your suggestion, so I took absolute value on pitch coherence to solve this problem.
However, the abnormal comb-filter strength remained, some values I got didn't range from 0~1. Did you have this problem when you clipped the pitch coherence? I don't know if I should clip comb-filter strength as well.
Hi, Oscar! I just show you and explain about my function of filter strength calculation.
void filter_strength_calc(float *Exp, float *Eyp, float *Ephatp, float* r){
//define variable
for(int i=0; i<NB_BANDS; ++i){
a = Ephatp[i]*Ephatp[i] - Exp[i]*Exp[i];
if (a<0) a=0;
b = Ephatp[i]*Eyp[i]*(1-Exp[i]*Exp[i]);
c = Exp[i]*Exp[i]-Eyp[i]*Eyp[i];
if (c<0) c=0;
alpha = (sqrt(b*b + a *(c))-b)/(a+1e-8);
r[i] = alpha/(1+alpha);
}
}
I also applied clipping for each term not only pitch coherence , for me sometimes a and c terms are lower then 0 so set it 0 forcefully. I thought Exp is larger then Eyp (Exp>Eyp) in ideal. cause y is x+noise so Y correlation between P (Eyp) must be lower then Exp. but in real it sometime or someband Exp is smaller then Eyp (Exp<Eyp) I assume this data is outlier so I set it 0. at the end alpha term I add small epsilon to prevent nan error
result of this function always range in 0~1. (But still wondering it's correct or not)
Hi Noah,
Thanks for sharing your code!
I'm now running your latest data generation code. Observing the result by using two sample data, I have a question about the data generation step in your code. The calculated gains of the first five frames are always 0, I think it is made by the 'look forward' in the origin paper. I think the first calculated gains and filter strength is the -5th frames label when you input the 0th frame. I think training the model may need to shift the output of gain and filter strength 5frame ahead. Could you please tell me if I had make something wrong, thanks!
Hi, @sTarAnna!
Thanks for checking my code reason that first five output are zero is because of comb_buf. comb_buf is needed for comb filtering implementation I use 5frame size buffering in comb_buf. so It takes 5times to get a first output. also, I buffered also Y not only output(r,g), you can check it first five of Ey,EphatY are also zero on result
Thanks for your replying! I have found my mistake.
Hi, Oscar! I just show you and explain about my function of filter strength calculation.
void filter_strength_calc(float *Exp, float *Eyp, float *Ephatp, float* r){ //define variable for(int i=0; i<NB_BANDS; ++i){ a = Ephatp[i]*Ephatp[i] - Exp[i]*Exp[i]; if (a<0) a=0; b = Ephatp[i]*Eyp[i]*(1-Exp[i]*Exp[i]); c = Exp[i]*Exp[i]-Eyp[i]*Eyp[i]; if (c<0) c=0; alpha = (sqrt(b*b + a *(c))-b)/(a+1e-8); r[i] = alpha/(1+alpha); } }
I also applied clipping for each term not only pitch coherence , for me sometimes a and c terms are lower then 0 so set it 0 forcefully. I thought Exp is larger then Eyp (Exp>Eyp) in ideal. cause y is x+noise so Y correlation between P (Eyp) must be lower then Exp. but in real it sometime or someband Exp is smaller then Eyp (Exp<Eyp) I assume this data is outlier so I set it 0. at the end alpha term I add small epsilon to prevent nan error
result of this function always range in 0~1. (But still wondering it's correct or not)
Hi Noah, thanks for your sharing!
In recent days I've been working on post-filtering, trying to integrate whole PercepNet before training NN. I have temporarily finished the post-filtering part and obtain the final result x_hat. However, the result shows no sign of good performance with either my adaptation (taking abs to all coherence and alpha) or your suggestion (data clipping). I was wondering if anything else goes wrong and maybe it's on post-filtering part. Not sure if you have already tried post-filtering, I listed some problems I have:
Taking one set of training data, I think if using these two parameters to process noisy signal y with PercepNet, the output x_hat should be close to clean x, right?
Since this calculation includes sine function, sometimes this warped gain could yield negative value, don't know if this is reasonable? As usual I took abs value to this warped gain to avoid Nan problem when calculating global gain G:
in which E0 and E1 I simply took eq.(2) and eq.(13) respectively.
In signal processing, we often calculate the sum of squared signal value as energy. However, in previous section of this paper, it defines the same symbol as 2-norm of signal:
While I was calculating envelope post-filtering section, I was not sure if I should take 2-norm value or squared value as energy. I chose the former to calculate across all PercepNet.
If you have any suggestion or comment, please share with me, thank you!
Hi, Oscar I haven't tried post filtering yet. But I get the good performance yesterday applying only gain and filter strength with my implementation (data clipping). I recommend you to check band gain multiplication for stft domain which was problem for me how about testing your x_hat without postfiltering?
Yes, you right. I used strength and gain like you mentioned, I get X_hat using this two parameters to Y. and my result of X_hat is almost close to x which means it removed noise
since I haven't tired post filtering yet. I have not much intuition about it. I want to ask something about question rather than answer. (I hope any other people answer to you) I want to ask why the warped gain could yield negative value. range of gb is 0 ~ 1 right? then, range for argument in sine function must be 0 ~ pi/2 it's obvious that sin(0)=0 sin(pi/2)=1. I think it cannot become negative and one more thing could you explain what is E0 and E1?
I think you should use squared value. based on RNNoise github which is Previous version of Author of Percepnet. He calculated band energy with squared value. also I appied same as RNNoise like below
void compute_band_energy(float *bandE, const kiss_fft_cpx *X) {
// ...
for (j=0;j<band_size;j++) {
float tmp;
float frac = (float)j/band_size;
tmp = SQUARE(X[(erb_band->nfftborder[i]) + j].r);
tmp += SQUARE(X[(erb_band->nfftborder[i]) + j].i);
sum[i] += (1-frac)*tmp;
sum[i+1] += frac*tmp;
}
Thanks!
Hi Noah,
Thanks for your test on PercepNet performance! Answer to 1 and 2, I think I understand why my gb sometimes larger than 1, since I use PercepNet on different application.
In my AEC application, the noisy y is AEC output, the echo-cancelled signal. As a result, it's reasonable that energy of y sometimes is lower than near-end clean speech x.
The E0 and E1 is the energy of enhanced signal using gb and g_warped respectively. If the signal energy is sum of squared, then (E0/E1) should be (gb/g_warped)^2.
Therefore, for PercepNet on AEC application, I will continue on some modification in DNS-Challenge PercepNet to suit for AEC application, especially when it comes to energy-related calculation.
Thanks!
is there a pre-trained model for PercepNet?
Hi Noah, Thank you for your sharing! I'm stuck with the gain attenuation term computing formula. In your code, that is: Exp square and Ephatp square can be any non-negative number, and then the value under the square root may be negative. How to solve this problem?
is there a pre-trained model for PercepNet?
there's no pre-trained model yet. feel free to contribute if you already have it!
Hi Noah, Thank you for your sharing! I'm stuck with the gain attenuation term computing formula. In your code, that is: Exp square and Ephatp square can be any non-negative number, and then the value under the square root may be negative. How to solve this problem?
I recommend you to make Exp and Ephatp 0 if it's lower than 0 before adjust gain strength like below
for (int i=0; i<NB_BANDS; ++i){
if(EPhatp[i]<0) EPhatp[i] = 0;
if(Exp[i]<0) Exp[i] =0;
}
Hi, Oscar
This is my answer on my opinion
- I apply comb filter to X and yield p-hat and also apply comb filter to Y and yield another p-hat (not sure this this is correct)
- reasonable range of pitch coherence is 0~1 I think.
- only pitch coherence of p-hat I use attenuation term, coherence of x,y is calculated by equation(5), I also assume p is p-hat when calculating pitch coherence
If you have nan problem one my opinion is clipping your data. sometimes pitch coherence value goes minus my solution was making it 0 when It is under 0
What I write is based on my understanding. It could be wrong so if anyone have intuition about this feel free to leave comment Thanks!
Hello, I am doing PercepNet AEC now again! Can we have a talk? maybe help each other
Hi, Oscar This is my answer on my opinion
- I apply comb filter to X and yield p-hat and also apply comb filter to Y and yield another p-hat (not sure this this is correct)
- reasonable range of pitch coherence is 0~1 I think.
- only pitch coherence of p-hat I use attenuation term, coherence of x,y is calculated by equation(5), I also assume p is p-hat when calculating pitch coherence
If you have nan problem one my opinion is clipping your data. sometimes pitch coherence value goes minus my solution was making it 0 when It is under 0 What I write is based on my understanding. It could be wrong so if anyone have intuition about this feel free to leave comment Thanks!
Hello, I am doing PercepNet AEC now again! Can we have a talk? maybe help each other
Sorry I was late Good to see you again and of course we can help each other! Feel free to use this page https://github.com/jzi040941/PercepNet/discussions
@jzi040941 Thank for your great work. Do you have any plan about RES+NS based on PercepNet?
I got a email from Yuyung Liau who want me to make discussion on Github issue since I got few email about implementing PercepNet, I think it's better to share with more people not one by one So I opened this Issue Any kinds of Discussion about PercepNet are welcomed!