yongxuUSTC / sednn

deep learning based speech enhancement using keras or pytorch, make it easy to use
http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/SE_DNN_taslp.html
333 stars 125 forks source link

some question of pad_with_border #8

Open Nickkk1124 opened 6 years ago

Nickkk1124 commented 6 years ago

Hello: Really impressed by your work and got a few questions in terms of how you process the data.

31131456_1863410970346818_1790379927908909056_n

Do pad_with_border mean this?

Many thanks, Nick

Nickkk1124 commented 6 years ago

Sorry In addition, I would like to ask if I want to use this speech-enhanced system in the front of the ASR. How do I do this?

Many thanks, Nick

qiuqiangkong commented 6 years ago

Hi Nick,

The picture you show is correct. pad_with_border simply extend the left and right border.

You may obtain enhanced speech from by running this code. Then ASR may apply post-hoc.

Best wishes,

Qiuqiang


From: Nickkk1124 notifications@github.com Sent: 24 April 2018 09:57:30 To: yongxuUSTC/sednn Cc: Subscribed Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8)

Sorry In addition, I would like to ask if I want to use this speech-enhanced system on the front of the asr. How do I do this?

Many thanks, Nick

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/8#issuecomment-383856941, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ydHnaYUDLH5wENARAsUg_HJAvFJbks5truj6gaJpZM4ThGhz.

Nickkk1124 commented 6 years ago

Hello Qiuqiang,

Mat_2d_to_3d is to convert features to (n_segs, n_concat, n_freq).

The center frame of the first round of stacking frames is t=1, and the center frame of the second round of stacking frames should not be t=2?

But as shown in the following figure, why is the center frame of the second round of stacking frames t=4?

31179984_1863648643656384_4102975115338186752_n

Many thanks,

Nick

yongxuUSTC commented 6 years ago

Hi Nick,

Yes, you can use the enhanced features for ASR. But maybe you should use retraining or joint-training of your backend acoustic model for ASR.

Good luck.

Best regards, yong


Dr. Yong XU https://sites.google.com/view/xuyong/home

From: Nickkk1124 Date: 2018-04-24 09:57 To: yongxuUSTC/sednn CC: Subscribed Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Sorry In addition, I would like to ask if I want to use this speech-enhanced system on the front of the asr. How do I do this? Many thanks, Nick — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Nickkk1124 commented 6 years ago

Hi Yong,

Thank you for your replying! There are some questions I'd like to ask:

  1. The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?

  2. Do you think using recover enhanced wav as ASR input is feasible?

  3. What would you recommend about applying the enhancement system to dealing with the environmental noise?

Many thanks, Nick

qiuqiangkong commented 6 years ago

Hi Nick,

In the picture you draw, it is correct. center frame=1 and center frame=4 in your drawing. It also depends on the hop.

"The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?"

"Do you think using recover enhanced wav as ASR input is feasible?"

It is feasible if the dataset is small. However bare in mind any speech denoising

"What would you recommend about applying the enhancement system to dealing with the environmental noise?"

Best wishes,

Qiuqiang


From: Nickkk1124 notifications@github.com Sent: 24 April 2018 17:18:58 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8)

Hi Yong,

Thank you for your replying! There are some questions I'd like to ask:

  1. The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?

  2. Do you think using recover enhanced wav as ASR input is feasible?

  3. What would you recommend about applying the enhancement system to dealing with the environmental noise?

Many thanks, Nick

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/8#issuecomment-383992952, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yahThNECOw9f22-pO8B3RIlbgshRks5tr1BxgaJpZM4ThGhz.

akshayaCap commented 6 years ago

Hello Qiuqiang,

This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.

"- method will lose some information. Some work did a joint enhancement and recognition."

I get the point of information loss. Can you please tell more about Joint enhancement and recognition?

Is it like two 2 DNN models interlinked or preprocessing and ASR.

Thank-you.

qiuqiangkong commented 6 years ago

Hi Nick,

If speech enhancement and ASR are done separately, the ASR performance might be reduced. Because sometimes speech enhancement will also move out some useful information of a speech. However, if they are combined to a single neural network it might be helpful. For example, use speech enhancement as lower layer of a neural network and use ASR as a high layer neural network. The loss function can combine the ASR and speech enhancement. It is just my conjecture and I am not aware if there is such work or not.

Best wishes,

Qiuqiang


From: akshayaCap notifications@github.com Sent: 05 July 2018 12:12:45 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8)

Hello Qiuqiang,

This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.

"- method will lose some information. Some work did a joint enhancement and recognition."

I get the point of information loss. Can you please tell more about Joint enhancement and recognition?

Is it like two 2 DNN models interlinked or preprocessing and ASR.

Thank-you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/8#issuecomment-402689055, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ybZnHdThX_lUbV1r7wLirIbLZnQuks5uDfStgaJpZM4ThGhz.

yongxuUSTC commented 6 years ago

Hi Nick,

Yes, there are joint SE & ASR training papers: https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html https://ieeexplore.ieee.org/abstract/document/7178797/

Best regards, yong


Yong XU https://sites.google.com/view/xuyong/home

From: qiuqiangkong Date: 2018-07-06 03:55 To: yongxuUSTC/sednn CC: yong xu @ seattle; Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hi Nick,

If speech enhancement and ASR are done separately, the ASR performance might be reduced. Because sometimes speech enhancement will also move out some useful information of a speech. However, if they are combined to a single neural network it might be helpful. For example, use speech enhancement as lower layer of a neural network and use ASR as a high layer neural network. The loss function can combine the ASR and speech enhancement. It is just my conjecture and I am not aware if there is such work or not.

Best wishes,

Qiuqiang


From: akshayaCap notifications@github.com Sent: 05 July 2018 12:12:45 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8)

Hello Qiuqiang,

This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.

"- method will lose some information. Some work did a joint enhancement and recognition."

I get the point of information loss. Can you please tell more about Joint enhancement and recognition?

Is it like two 2 DNN models interlinked or preprocessing and ASR.

Thank-you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/8#issuecomment-402689055, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ybZnHdThX_lUbV1r7wLirIbLZnQuks5uDfStgaJpZM4ThGhz. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

akshayaCap commented 6 years ago

Dear Yong, " Yes, there are joint SE & ASR training papers: https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html https://ieeexplore.ieee.org/abstract/document/7178797/ " It was an informative read. It would be great if you could post a link to its implementation (source code)

Thank-you, Akshaya