yongxuUSTC / sednn

deep learning based speech enhancement using keras or pytorch, make it easy to use
http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/SE_DNN_taslp.html
334 stars 124 forks source link

Difficulty with training model for multiple SNR #17

Open akshayaCap opened 6 years ago

akshayaCap commented 6 years ago

Hi Yong Xu,

Thank you for your prompt replies on queries. I have been able to train model for a single SNR. Now I am trying to train a model on 5 SNRs (0db, 10db, 15db, 20db, 30 db). I have generated the combined data.h5 file but is has huge size of 27GB. Now, when I am trying to read this file to generate scalar.p my system is hanging. Machine config: GTX 1060, 32 GB RAM, 1TB Hard drive. Can you please help me out with the same?

Thanks Akshaya

qiuqiangkong commented 6 years ago

Hi Akshaya,

I think this might be too big for your 32 GB memory machine.

You can either reduce the size of your data, or an optional way is to mix the speech with the noise on the fly so you will have infinite mixture.

Best wishes,

Qiuqiang


From: akshayaCap notifications@github.com Sent: 07 August 2018 10:37:33 To: yongxuUSTC/sednn Cc: Subscribed Subject: [yongxuUSTC/sednn] Difficulty with training model for multiple SNR (#17)

Hi Yong Xu,

Thank you for your prompt replies on queries. I have been able to train model for a single SNR. Now I am trying to train a model on 5 SNRs (0db, 10db, 15db, 20db, 30 db). I have generated the combined data.h5 file but is has huge size of 27GB. Now, when I am trying to read this file to generate scalar.p my system is hanging. Machine config: GTX 1060, 32 GB RAM, 1TB Hard drive. Can you please help me out with the same?

Thanks Akshaya

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/17, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yWmjoEaRlKHGt3oHXPye4Fk9J8Mbks5uOV_dgaJpZM4Vx0bk.

akshayaCap commented 6 years ago

Hi Qiuqiang, Glad to here from you. " I think this might be too big for your 32 GB memory machine. You can either reduce the size of your data, or an optional way is to mix the speech with the noise on the fly so you will have infinite mixture. "

In that case how do I generate the scalar.p file which is a transformation to be applied on each train and test sample?

Thank you, Akshaya

qiuqiangkong commented 6 years ago

Hi Akshaya,

That is, you load both timit and noise data to memory. Then you you write a data generator and randomly mix them on the fly to get a minibatch of data. Then no need to use scalar.p any more.

Best wishes,

Qiuqiang


From: akshayaCap notifications@github.com Sent: 07 August 2018 10:44:44 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] Difficulty with training model for multiple SNR (#17)

Hi Qiuqiang, Glad to here from you. " I think this might be too big for your 32 GB memory machine. You can either reduce the size of your data, or an optional way is to mix the speech with the noise on the fly so you will have infinite mixture. "

In that case how do I generate the scalar.p file which is a transformation to be applied on each train and test sample?

Thank you, Akshaya

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/17#issuecomment-410999231, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ycM_uqqwE7uQTDHkGmL3RPBTp22uks5uOWGMgaJpZM4Vx0bk.

akshayaCap commented 6 years ago

Hi Qiuqiang,

" That is, you load both timit and noise data to memory. Then you you write a data generator and randomly mix them on the fly to get a minibatch of data. Then no need to use scalar.p any more. "

As per my understanding scaler.p stores the information about mean and standard deviation of entire training data which is used to standardize the training and test audio file by transforming it to another Euclidean space . It will also be used while we find inference of an unknown audio ( not used in training and test dataset). This was the Enhancement parameter using global variance equalization (alpha bar) mentioned in Reference paper: GLOBAL VARIANCE EQUALIZATION FOR IMPROVING DEEP NEURAL NETWORK BASED SPEECH ENHANCEMENT

Please let me know if I am missing something.

qiuqiangkong commented 6 years ago

Hi Akshaya,

 Yes , you can calculate the scalar of all speech and noise to the memory after loading them to memory.

Best wishes,

Qiuqiang


From: akshayaCap notifications@github.com Sent: 07 August 2018 11:07:06 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] Difficulty with training model for multiple SNR (#17)

Hi Qiuqiang,

" That is, you load both timit and noise data to memory. Then you you write a data generator and randomly mix them on the fly to get a minibatch of data. Then no need to use scalar.p any more. "

As per my understanding scaler.p stores the information about mean and standard deviation of entire training data which is used to standardize the training and test audio file by transforming it to another Euclidean space . It will also be used while we find inference of an unknown audio ( not used in training and test dataset). This was the Enhancement parameter using global variance equalization (alpha bar) mentioned in Reference paper: GLOBAL VARIANCE EQUALIZATION FOR IMPROVING DEEP NEURAL NETWORK BASED SPEECH ENHANCEMENT

Please let me know if I am missing something.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/17#issuecomment-411005453, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5ydIZ7ieqkOP9dh0Wtwz0CUgc4ES6ks5uOWbKgaJpZM4Vx0bk.