yongxuUSTC / sednn

deep learning based speech enhancement using keras or pytorch, make it easy to use
http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/SE_DNN_taslp.html
334 stars 124 forks source link

About global variance #19

Closed jeongHwarr closed 2 years ago

jeongHwarr commented 6 years ago

Hi Yong Xu,

I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part?

and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean.

I founded that enhancemented waves are strange..

This is mixed audio below. image

and this is enhancemented audio below. image

Why did this result come out?

Thank you!

yongxuUSTC commented 6 years ago

Hi,

Global variance equalization is not included in this sednn project. Your result is magnitude clipping.

You should do some normalization, and also control the magnitude of speech when stored into a 16bit wav

Best regards, yong


Yong XU

From: jeonghwaYoo Date: 2018-09-20 00:53 To: yongxuUSTC/sednn CC: Subscribed Subject: [yongxuUSTC/sednn] About global variance (#19) Hi Yong Xu, I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part? and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean. I founded that enhancemented waves are strange.. This is mixed audio below. and this is enhancemented audio below. Why did this result come out? Thank you! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

maogewudi007 commented 6 years ago

Hi Yong Xu,

I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part?

and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean.

I founded that enhancemented waves are strange..

This is mixed audio below. image

and this is enhancemented audio below. image

Why did this result come out?

Thank you!

I meet the same problem...Do you solve it? Thank you!!!

qiuqiangkong commented 5 years ago

Hi,

For the clip phenomenon, please check if you applied the normalization (subtract mean and divide the standard value) in both the training and testing dataset correctly.

Best wishes,

Qiuqiang


From: maogewudi007 notifications@github.com Sent: 29 November 2018 09:55:28 To: yongxuUSTC/sednn Cc: Subscribed Subject: Re: [yongxuUSTC/sednn] About global variance (#19)

Hi Yong Xu,

I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part?

and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean.

I founded that enhancemented waves are strange..

This is mixed audio below. [image]https://user-images.githubusercontent.com/16534413/45803808-7920c400-bcf5-11e8-82bd-91e47968649c.png

and this is enhancemented audio below. [image]https://user-images.githubusercontent.com/16534413/45803780-61494000-bcf5-11e8-9571-5a8cad148340.png

Why did this result come out?

Thank you!

I meet the same problem...Do you solve it? Thank you!!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/19#issuecomment-442773564, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yU-n8B_JvzoxiQcqvv7zEYxp475tks5uz68PgaJpZM4Wxkjc.

maogewudi007 commented 5 years ago

@qiuqiangkong I met the plobblem in test dataset. I make the test dataset according to the code. I notice in the preprocessing, you use the normalization. But in the "recover wav" preprocessing, there is no code about normalization. Maybe the reason is that. I will thankful if you fix it! At the mean time, I notice the code don't have inference part.(I write the part, but the experiment result still has clip phenomenon.)

jeongHwarr commented 5 years ago

I solved this problem. I modified the inference part in main_dnn.py. In Recover enhanced wav part, there is some part to compensate the amplitude. I commented out 's *= np.sqrt((np.hamming(n_window)**2).sum())' (Line 263). As a result, I able to extract the resulting file without the clip. But I think that normalization seems to be a fundamental solution. In my case, the amplitude of each audio was too different. So it seems that the clip phenomenon occurred in audio.

qiuqiangkong commented 5 years ago

Glad to hear that. By the way pay attention in the testing stage the system should use the normalization scalar calculated from the training stage.

Best wishes,

Qiuqiang


From: jeonghwaYoo notifications@github.com Sent: 03 December 2018 03:50:51 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] About global variance (#19)

I solved this problem. I modified the inference part in main_dnn.py. In Recover enhanced wav part, there is some part to compensate the amplitude. I commented out 's *= np.sqrt((np.hamming(n_window)**2).sum())'. As a result, I able to extract the resulting file without the clip. But I think that normalization seems to be a fundamental solution. In my case, the amplitude of each audio was too different. So it seems that the clip phenomenon occurred in audio.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/19#issuecomment-443581144, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yUN0NhRcyZ7DCtuVWp5UpOqI0Tvoks5u1J-bgaJpZM4Wxkjc.

yongxuUSTC commented 5 years ago

Hi,

If you test the real-world noisy speech, please use my another code: https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool It has already considered the magnitude difference and better normalization strategy.

If you really want to use the python code, i suggest you can subtract the utterance mean first, then global mean-var norm. During the reconstruction stage, you can pred*STD+Global_Mean + utterance mean.

Best regards, yong

On Thu, 29 Nov 2018 at 01:55, maogewudi007 notifications@github.com wrote:

Hi Yong Xu,

I'm really appreciate for your project. I have some questions. First, does this project include global variance equalization? If so, which part?

and I'm trained this model with my data. (I think about 20 hours data for train) However, PESQ results with enhancemented data were worse than comparing noisy and clean.

I founded that enhancemented waves are strange..

This is mixed audio below. [image: image] https://user-images.githubusercontent.com/16534413/45803808-7920c400-bcf5-11e8-82bd-91e47968649c.png

and this is enhancemented audio below. [image: image] https://user-images.githubusercontent.com/16534413/45803780-61494000-bcf5-11e8-9571-5a8cad148340.png

Why did this result come out?

Thank you!

I meet the same problem...Do you solve it? Thank you!!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/yongxuUSTC/sednn/issues/19#issuecomment-442773564, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJj0uAPu-rxZdactynWzoLazPqhgpbvks5uz68PgaJpZM4Wxkjc .

jeongHwarr commented 5 years ago

Thank you! However, I have some more questions. In the function named inference in main_dnn.py, you have not used the alpha value you used to normalize. Is it okay if this is not used? And I want to know the exact meaning of the scaler value. What method is used to scale?

qiuqiangkong commented 5 years ago

Hi, we subtract the mean and divide the standard value of the input, for both training and inference. I think the alpha you mentioned is just a mixing coefficient in the mixing procedure.

Best wishes,

Qiuqiang


From: jeonghwaYoo notifications@github.com Sent: 14 February 2019 09:57 To: yongxuUSTC/sednn Cc: Kong, Qiuqiang (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] About global variance (#19)

Thank you! However, I have some more questions. In the function named inference in main_dnn.py, you have not used the alpha value you used to normalize. Is it okay if this is not used? And I want to know the exact meaning of the scaler value. What method is used to scale?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/19#issuecomment-463565683, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yYQS3SjGiikFGjNE7TpUchz6sXJFks5vNTL_gaJpZM4Wxkjc.

zyy341 commented 5 years ago

Hi,I want to know the meaning of utterance mean. In my view,utterance mean is the mean of each T-F bin of clean speech. Could you be so kind as to provide me with some help on the above-mentioned problem? Thank you for your kindness.@yongxuUSTC

Hi, If you test the real-world noisy speech, please use my another code: https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool It has already considered the magnitude difference and better normalization strategy. If you really want to use the python code, i suggest you can subtract the utterance mean first, then global mean-var norm. During the reconstruction stage, you can pred*STD+Global_Mean + utterance mean. Best regards, yong

yongxuUSTC commented 5 years ago

Hi,

Utterance mean is the average of each frames. Its dimension is (257,1) , 257 is the dimension for difference frequency bin.

yong

On Sun, 24 Feb 2019 at 17:55, zyy341 notifications@github.com wrote:

Hi,I want to know the meaning of utterance mean. In my view,utterance mean is the mean of each T-F bin of clean speech. Could you be so kind as to provide me with some help on the above-mentioned problem? Thank you for your kindness.@yongxuUSTC https://github.com/yongxuUSTC

Hi, If you test the real-world noisy speech, please use my another code: https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool It has already considered the magnitude difference and better normalization strategy. If you really want to use the python code, i suggest you can subtract the utterance mean first, then global mean-var norm. During the reconstruction stage, you can pred*STD+Global_Mean + utterance mean. Best regards, yong

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yongxuUSTC/sednn/issues/19#issuecomment-466844925, or mute the thread https://github.com/notifications/unsubscribe-auth/AFJj0pt3mYiUE0Zbs55yJLBDa-lGg2LAks5vQ0JbgaJpZM4Wxkjc .