yongxuUSTC / sednn

deep learning based speech enhancement using keras or pytorch, make it easy to use
http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/SE_DNN_taslp.html
334 stars 124 forks source link

Problem in the size of training set #15

Open lupengliu opened 6 years ago

lupengliu commented 6 years ago

Hello Yong, qiuqiang kong,

I am really impressed with your work. I'm trying to use the python code to train my own model with TIMIT data set. However, I found it's difficult to use all of 4620 audios as the training set, due to the limitation of CPU memory. Could you teach me how many audios did you use to train your best model? How many noise did you use? (I'm using n1~n100, total 100 noise audios, and which value of magnification should I use?) Since if I use 4620 audios as clean speech, and use 100 noise audios to apply on every clean speech, it will generate a super large training set.. ,which cannot be loaded in CPU memory...

Thank you very much!

BRs, Lupeng Liu

yongxuUSTC commented 6 years ago

Hi Lupeng,

Yes, the code has this problem. One way is to split your data into several parts and load to the CPU memory in several times. The second way is rewrite the code into a on-the-fly mode which means adding noise between randomly selected speech and noise is conducted during the training (not before the training).

Best regards, yong


Yong XU https://sites.google.com/view/xuyong/home

From: lupengliu Date: 2018-07-03 19:48 To: yongxuUSTC/sednn CC: Subscribed Subject: [yongxuUSTC/sednn] Problem in the size of training set (#15) Hello Yong, qiuqiang kong, I am really impressed with your work. I'm trying to use the python code to train my own model with TIMIT data set. However, I found it's difficult to use all of 4620 audios as the training set, due to the limitation of CPU memory. Could you teach me how many audios did you use to train your best model? How many noise did you use? (I'm using n1~n100, total 100 noise audios, and which value of magnification should I use?) Since if I use 4620 audios as clean speech, and use 100 noise audios to apply on every clean speech, it will generate a super large training set.. ,which cannot be loaded in CPU memory... Thank you very much! BRs, Lupeng Liu — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

lupengliu commented 6 years ago

Hi Yong,

Thank you for your kind reply! I have tried your opinion and trained again. However, It's unfortunate that the training loss is much higher than 0 (about 0.24) even if I increase the iteration to 500000 (original iteration is set to10000). The training set I use is 4620 TIMIT sentences & n1-n115 noise audios. I also set train&test_snr = 10.

Yong, could you please share your configuration of some hyper-parameters? For example, the value of iteration & learning rate, batch size, and the choice of optimizer. I really want to re-produce your excellent result.

Thank you very much!

BRs, Lupeng

qiuqiangkong commented 6 years ago

Hi Lupeng,

The code https://github.com/yongxuUSTC/sednn/tree/master/mixture2clean_dnn is a re implementation of this paper [1]. The parameters might be slightly different but we observe there is not much difference of the result. You may find all parameter configuration from the code.

The training loss will not go to zero. We trained on 10000 iterations, batch_size=500, learning_rate=1e-4 and Adam optimizer. Then you will get a quite promising result.

[1]A Regression Approach to Speech Enhancement Based on Deep Neural Networks.YongXu,JunDu,Li-Rong Dai and Chin-Hui Lee, IEEE/ACM Transactions on Audio,Speech, and Language Processing,P.7-19,Vol.23,No.1, 2015

Best wishes,

Qiuqiang


From: lupengliu notifications@github.com Sent: 06 July 2018 08:23:07 To: yongxuUSTC/sednn Cc: Subscribed Subject: Re: [yongxuUSTC/sednn] Problem in the size of training set (#15)

Hi Yong,

Thank you for your kind reply! I have tried your opinion and trained again. However, It's unfortunate that the training loss is much higher than 0 (about 0.24) even if I increase the iteration to 500000 (original iteration is set to10000). The training set I use is 4620 TIMIT sentences & n1-n115 noise audios. I also set train&test_snr = 10.

Yong, could you please share your configuration of some hyper-parameters? For example, the value of iteration & learning rate, batch size, and the choice of optimizer. I really want to re-produce your excellent result.

Thank you very much!

BRs, Lupeng

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/15#issuecomment-402950072, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yciI_4F6E8ive6Ou3UX6w3Sl2sjBks5uDxBbgaJpZM4VB4SW.

XIEchoAH commented 6 years ago

@qiuqiangkong 您好, 我按照您的实验参数和论文给出的训练条件,设置training-set 为100h+,在训练时使用multi-SNR批量训练,但在测试15个Noise92噪声时,PSEQ停滞在2.5左右,与论文相差0.3(测试原始噪声的PESQ却与论文2.20相同),无论是再增大training-set 还是迭代次数,结果都没有变得更好。所有训练和测试数据也都是16KHz无误,您认为有哪些潜在问题或者改进方案能再提高PESQ呢?

谢谢!

yongxuUSTC commented 6 years ago

Hi,

这里有我的最佳模型 demo, matlab解码: https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool 提高PESQ: dropout,noise aware training,mask-based post-processing, etc.

[1]A Regression Approach to Speech Enhancement Based on Deep Neural Networks.YongXu,JunDu,Li-Rong Dai and Chin-Hui Lee, IEEE/ACM Transactions on Audio,Speech, and Language Processing,P.7-19,Vol.23,No.1, 2015 [2] Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement, Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee, Interspeech2015 Some DNN based speech enhancemen demos: http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/SE_DNN_taslp.html http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/IS15.html

Best regards, yong


Yong XU https://sites.google.com/view/xuyong/home

From: XIEchoAH Date: 2018-09-03 19:14 To: yongxuUSTC/sednn CC: yong xu @ seattle; Comment Subject: Re: [yongxuUSTC/sednn] Problem in the size of training set (#15) @qiuqiangkong 您好, 我按照您的实验参数和论文给出的训练条件,设置training-set 为100h+,在训练时使用multi-SNR批量训练,但在测试15个Noise92噪声时,PSEQ停滞在2.5左右,与论文相差0.3(测试原始噪声的PESQ却与论文2.20相同),无论是再增大training-set 还是迭代次数,结果都没有变得更好。所有训练和测试数据也都是16KHz无误,您认为有哪些潜在问题或者改进方案能再提高PESQ呢? 谢谢! — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

qiuqiangkong commented 6 years ago

Hi Lupeng,

You may also conider using different losses, for example, L1 or L2 loss. We did not look this in detail but they will produce different enhancement result.

Best wishes,

Qiuqiang


From: yong xu @ seattle notifications@github.com Sent: 04 September 2018 05:04:55 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Mention Subject: Re: [yongxuUSTC/sednn] Problem in the size of training set (#15)

Hi,

这里有我的最佳模型 demo, matlab解码: https://github.com/yongxuUSTC/DNN-Speech-enhancement-demo-tool 提高PESQ: dropout,noise aware training,mask-based post-processing, etc.

[1]A Regression Approach to Speech Enhancement Based on Deep Neural Networks.YongXu,JunDu,Li-Rong Dai and Chin-Hui Lee, IEEE/ACM Transactions on Audio,Speech, and Language Processing,P.7-19,Vol.23,No.1, 2015 [2] Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement, Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee, Interspeech2015 Some DNN based speech enhancemen demos: http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/SE_DNN_taslp.html http://staff.ustc.edu.cn/~jundu/The%20team/yongxu/demo/IS15.html

Best regards, yong


Yong XU https://sites.google.com/view/xuyong/home

From: XIEchoAH Date: 2018-09-03 19:14 To: yongxuUSTC/sednn CC: yong xu @ seattle; Comment Subject: Re: [yongxuUSTC/sednn] Problem in the size of training set (#15) @qiuqiangkong 您好, 我按照您的实验参数和论文给出的训练条件,设置training-set 为100h+,在训练时使用multi-SNR批量训练,但在测试15个Noise92噪声时,PSEQ停滞在2.5左右,与论文相差0.3(测试原始噪声的PESQ却与论文2.20相同),无论是再增大training-set 还是迭代次数,结果都没有变得更好。所有训练和测试数据也都是16KHz无误,您认为有哪些潜在问题或者改进方案能再提高PESQ呢? 谢谢! ― You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

― You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/yongxuUSTC/sednn/issues/15#issuecomment-418235067, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMt5yQsb1p6yuc_wtApH6c-kullAarA7ks5uXfvngaJpZM4VB4SW.