Closed jeongHwarr closed 2 years ago
Hi,
Global variance equalization is not included in this sednn project. Your result is magnitude clipping.
You should do some normalization, and also control the magnitude of speech when stored into a 16bit wav
Best regards, yong
Yong XU
From: jeonghwaYoo Date: 2018-09-20 00:53 To: yongxuUSTC/sednn CC: Subscribed Subject: [yongxuUSTC/sednn] About global variance (#19) Hi Yong Xu, I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part? and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean. I founded that enhancemented waves are strange.. This is mixed audio below. and this is enhancemented audio below. Why did this result come out? Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hi Yong Xu,
I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part?
and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean.
I founded that enhancemented waves are strange..
This is mixed audio below.
and this is enhancemented audio below.
Why did this result come out?
Thank you!
I meet the same problem...Do you solve it? Thank you!!!
Hi,
For the clip phenomenon, please check if you applied the normalization (subtract mean and divide the standard value) in both the training and testing dataset correctly.
Best wishes,
Qiuqiang
From: maogewudi007 notifications@github.com Sent: 29 November 2018 09:55:28 To: yongxuUSTC/sednn Cc: Subscribed Subject: Re: [yongxuUSTC/sednn] About global variance (#19)
Hi Yong Xu,
I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part?
and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean.
I founded that enhancemented waves are strange..
This is mixed audio below. [image]https://user-images.githubusercontent.com/16534413/45803808-7920c400-bcf5-11e8-82bd-91e47968649c.png
and this is enhancemented audio below. [image]https://user-images.githubusercontent.com/16534413/45803780-61494000-bcf5-11e8-9571-5a8cad148340.png
Why did this result come out?
Thank you!
I meet the same problem...Do you solve it? Thank you!!!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/19#issuecomment-442773564, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yU-n8B_JvzoxiQcqvv7zEYxp475tks5uz68PgaJpZM4Wxkjc.
@qiuqiangkong I met the plobblem in test dataset. I make the test dataset according to the code. I notice in the preprocessing, you use the normalization. But in the "recover wav" preprocessing, there is no code about normalization. Maybe the reason is that. I will thankful if you fix it! At the mean time, I notice the code don't have inference part.(I write the part, but the experiment result still has clip phenomenon.)
I solved this problem. I modified the inference part in main_dnn.py. In Recover enhanced wav part, there is some part to compensate the amplitude. I commented out 's *= np.sqrt((np.hamming(n_window)**2).sum())' (Line 263). As a result, I able to extract the resulting file without the clip. But I think that normalization seems to be a fundamental solution. In my case, the amplitude of each audio was too different. So it seems that the clip phenomenon occurred in audio.
Glad to hear that. By the way pay attention in the testing stage the system should use the normalization scalar calculated from the training stage.
Best wishes,
Qiuqiang
From: jeonghwaYoo notifications@github.com Sent: 03 December 2018 03:50:51 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] About global variance (#19)
I solved this problem. I modified the inference part in main_dnn.py. In Recover enhanced wav part, there is some part to compensate the amplitude. I commented out 's *= np.sqrt((np.hamming(n_window)**2).sum())'. As a result, I able to extract the resulting file without the clip. But I think that normalization seems to be a fundamental solution. In my case, the amplitude of each audio was too different. So it seems that the clip phenomenon occurred in audio.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/19#issuecomment-443581144, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yUN0NhRcyZ7DCtuVWp5UpOqI0Tvoks5u1J-bgaJpZM4Wxkjc.
Hi,
If you test the real-world noisy speech, please use my another code: https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool It has already considered the magnitude difference and better normalization strategy.
If you really want to use the python code, i suggest you can subtract the utterance mean first, then global mean-var norm. During the reconstruction stage, you can pred*STD+Global_Mean + utterance mean.
Best regards, yong
On Thu, 29 Nov 2018 at 01:55, maogewudi007 notifications@github.com wrote:
Hi Yong Xu,
I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part?
and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean.
I founded that enhancemented waves are strange..
This is mixed audio below. [image: image] https://user-images.githubusercontent.com/16534413/45803808-7920c400-bcf5-11e8-82bd-91e47968649c.png
and this is enhancemented audio below. [image: image] https://user-images.githubusercontent.com/16534413/45803780-61494000-bcf5-11e8-9571-5a8cad148340.png
Why did this result come out?
Thank you!
I meet the same problem...Do you solve it? Thank you!!!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/yongxuUSTC/sednn/issues/19#issuecomment-442773564, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJj0uAPu-rxZdactynWzoLazPqhgpbvks5uz68PgaJpZM4Wxkjc .
Thank you! However, I have some more questions. In the function named inference in main_dnn.py, you have not used the alpha value you used to normalize. Is it okay if this is not used? And I want to know the exact meaning of the scaler value. What method is used to scale?
Hi, we subtract the mean and divide the standard value of the input, for both training and inference. I think the alpha you mentioned is just a mixing coefficient in the mixing procedure.
Best wishes,
Qiuqiang
From: jeonghwaYoo notifications@github.com Sent: 14 February 2019 09:57 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] About global variance (#19)
Thank you! However, I have some more questions. In the function named inference in main_dnn.py, you have not used the alpha value you used to normalize. Is it okay if this is not used? And I want to know the exact meaning of the scaler value. What method is used to scale?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/19#issuecomment-463565683, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yYQS3SjGiikFGjNE7TpUchz6sXJFks5vNTL_gaJpZM4Wxkjc.
Hi,I want to know the meaning of utterance mean. In my view,utterance mean is the mean of each T-F bin of clean speech. Could you be so kind as to provide me with some help on the above-mentioned problem? Thank you for your kindness.@yongxuUSTC
Hi, If you test the real-world noisy speech, please use my another code: https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool It has already considered the magnitude difference and better normalization strategy. If you really want to use the python code, i suggest you can subtract the utterance mean first, then global mean-var norm. During the reconstruction stage, you can pred*STD+Global_Mean + utterance mean. Best regards, yong
Hi,
Utterance mean is the average of each frames. Its dimension is (257,1) , 257 is the dimension for difference frequency bin.
yong
On Sun, 24 Feb 2019 at 17:55, zyy341 notifications@github.com wrote:
Hi,I want to know the meaning of utterance mean. In my view,utterance mean is the mean of each T-F bin of clean speech. Could you be so kind as to provide me with some help on the above-mentioned problem? Thank you for your kindness.@yongxuUSTC https://github.com/yongxuUSTC
Hi, If you test the real-world noisy speech, please use my another code: https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool It has already considered the magnitude difference and better normalization strategy. If you really want to use the python code, i suggest you can subtract the utterance mean first, then global mean-var norm. During the reconstruction stage, you can pred*STD+Global_Mean + utterance mean. Best regards, yong
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongxuUSTC/sednn/issues/19#issuecomment-466844925, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJj0pt3mYiUE0Zbs55yJLBDa-lGg2LAks5vQ0JbgaJpZM4Wxkjc .
Hi Yong Xu,
I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part?
and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean.
I founded that enhancemented waves are strange..
This is mixed audio below.
and this is enhancemented audio below.
Why did this result come out?
Thank you!