Open ucasiggcas opened 5 years ago
If I set the TIMIT/TEST instead of noise/timit_coretest
,
I then got 10 raw files in SE\data\test\clean
and if I run the make_test_noisy.m directly,
I got 450 raw files in SE\data\test\noisy
Number of Noisy files is not equal to the Number of Clean files !!What's wrong ? If not
But How I set the Validation dataset ? I used to randomly select about 50 noisy utterances with corresponding clean utterances from test set then, move these to prj_dir/SE/data/valid/noisy and prj_dir/SE/data/valid/clean.
How to solve it ?
Thx
@jtkim-kaist
If I set the data set of test_list to TEST under the TIMIT folder, then the result I get is the same as you, there are 10 RAW files in the clean folder. However, when I set the data set of test_list to noise/timit_coretest, no results are generated in the clean folder and the noisy folder, and the following happens in the command line window of MATLAB: When I run this file directly, I changed "parfor i = 1:1:10" in the make_clean and make_noisy blocks to "parfor i = 1:1:length(timit_list)" because of an error: I just found out that in make_train_noisy.m, I set a parameter aug, aug=1, and in the program: So, after running the make_train_noisy.m file, we only get one RAW file in the noisy and clean folders in /SE/data/train/ respectively. Is it because of this parameter?
you mean ./speech/timit_coretest
? It's just one of timit test set.
No you don't need to resample the v_addnoise function will resample automatically (but I used 16000.)
This script is for data augmentation so, it will make # clean_wavs SNRs # noise types.
I also used randomly chosen utterances in test set as validation set.
@MayMiao0923 The aug parameter adjust the data size, is you set aug larger, you can obtain various types of training data. Because the noise addition and snr selection are randomly chosen while building the training set.
@MayMiao0923
大佬,你给我发的TIMIT文件内没有 noise/timit_coretest
Because the value of arg is 1, so I can only get one RAW file in the clean and noisy folders respectively?
So, if I want to get more training data, do I need to change the value of aug to a larger value, such as 300 or 1000? @jtkim-kaist
:) 300 is too much. I usually set 20 for aug value. However, If you have much more dataset than I introduced, 300 will be appropriate.
The TIMIT corpus I shared with you @ucasiggcas was downloaded from the link to the TIMIT corpus given by the author. I also did not find "noise/timit_coretest". Excuse me, is your test data a data set that you made yourself? @jtkim-kaist
Yes, In TIMIT document, I found the list of coretest, from the whole TEST, I just selected that coretest files.
@MayMiao0923 大佬, 他这句话啥意思? 他也是从TIMIT/TEST中选择的一些? 还是随机选择的wav文件?
我觉得他要是自己创建的就应该分享出来, 既然开源了,就不要制造一些trick,给别人增加复现的困难 这是最令人不爽的。
If I am not mistaken, you are selecting the coretest file from TEST according to Table 1 in the TESTTEST file in the DOC folder. Is that correct? And when running make_test_noisy.m, the value of k should be determined by the statement "parfor k = 1:1:length(timit_list)" instead of "parfor k = 1:1:10". @jtkim-kaist
@ucasiggcas 我也很不爽,他的coretest应该是根据这个表1选的语音
验证集也不好选择,根本不知道从哪选择50个文件, 完全是一脸懵逼
交代不清 既想分享又不想分享,我感觉他自己都很矛盾。心机 作为一个韩国人,完全没有我国东北人的豪爽。
根据他得上下文,50个文件应该是从测试集中选择的
但我生成的测试集中,
我就完全不知所措了
他是不是在训练完所有的TEST之后从中选择了50个文件作为之后的测试数据 @ucasiggcas
@MayMiao0923 Yes. you are correct. The 10 is just for small test dataset. Originally, length(~) you mentioned is correct.
@MayMiao0923 如果设为 parfor k = 1:1:length(timit_list) 并采用coreTest 那么得到的test/noisy/下有10800个raw文件 test/clean仍旧是10个文件
那么如何选择验证集的50个文件? 这50个文件到底从哪选择呢?
我按照我上面给出的表1,把表里所有提到的说话人的语音文件夹选择出来,汇集成了 timit_coretest :
并且K设为 parfor k = 1:1:length(timit_list) 得到的test/noisy/下有10800个raw文件 test/clean下有240个文件
same as you,
Dear,
In the script make_test_noisy.m 1. what's the
noise/timit_coretest
? is the files in TIMIT/TEST as below ? 2. and whether need to resample the noise/NoiseX-92 ? I see thatNOISEX-92_16000
and the wav files' sample rate in the NoiseX-92 is not 16000 , Should I need resample at first ? Or just directly run the script ?Please Help me ? SOS Thx