how to deal with make_test_noisy.m ?

ucasiggcas commented 5 years ago

Dear,

In the script make_test_noisy.m 1. what's the noise/timit_coretest ? is the files in TIMIT/TEST as below ? 2. and whether need to resample the noise/NoiseX-92 ? I see that NOISEX-92_16000 and the wav files' sample rate in the NoiseX-92 is not 16000 , Should I need resample at first ? Or just directly run the script ?

Please Help me ? SOS Thx

ucasiggcas commented 5 years ago

If I set the TIMIT/TEST instead of noise/timit_coretest , I then got 10 raw files in SE\data\test\clean

and if I run the make_test_noisy.m directly, I got 450 raw files in SE\data\test\noisy

Number of Noisy files is not equal to the Number of Clean files !!What's wrong ? If not

But How I set the Validation dataset ? I used to randomly select about 50 noisy utterances with corresponding clean utterances from test set then, move these to prj_dir/SE/data/valid/noisy and prj_dir/SE/data/valid/clean.

How to solve it ?

Thx

ucasiggcas commented 5 years ago

@jtkim-kaist

MayMiao0923 commented 5 years ago

If I set the data set of test_list to TEST under the TIMIT folder, then the result I get is the same as you, there are 10 RAW files in the clean folder. However, when I set the data set of test_list to noise/timit_coretest, no results are generated in the clean folder and the noisy folder, and the following happens in the command line window of MATLAB: 捕获11 When I run this file directly, I changed "parfor i = 1:1:10" in the make_clean and make_noisy blocks to "parfor i = 1:1:length(timit_list)" because of an error：捕获12 捕获13 I just found out that in make_train_noisy.m, I set a parameter aug, aug=1, and in the program: 捕获14 So, after running the make_train_noisy.m file, we only get one RAW file in the noisy and clean folders in /SE/data/train/ respectively. Is it because of this parameter?

jtkim-kaist commented 5 years ago

you mean ./speech/timit_coretest ? It's just one of timit test set.
No you don't need to resample the v_addnoise function will resample automatically (but I used 16000.)
This script is for data augmentation so, it will make # clean_wavs SNRs # noise types.

I also used randomly chosen utterances in test set as validation set.

jtkim-kaist commented 5 years ago

@MayMiao0923 The aug parameter adjust the data size, is you set aug larger, you can obtain various types of training data. Because the noise addition and snr selection are randomly chosen while building the training set.

ucasiggcas commented 5 years ago

@MayMiao0923 大佬，你给我发的TIMIT文件内没有 noise/timit_coretest

MayMiao0923 commented 5 years ago

Because the value of arg is 1, so I can only get one RAW file in the clean and noisy folders respectively? 捕获6 捕获7

So, if I want to get more training data, do I need to change the value of aug to a larger value, such as 300 or 1000? @jtkim-kaist

jtkim-kaist commented 5 years ago

:) 300 is too much. I usually set 20 for aug value. However, If you have much more dataset than I introduced, 300 will be appropriate.

MayMiao0923 commented 5 years ago

The TIMIT corpus I shared with you @ucasiggcas was downloaded from the link to the TIMIT corpus given by the author. I also did not find "noise/timit_coretest". Excuse me, is your test data a data set that you made yourself? @jtkim-kaist

jtkim-kaist commented 5 years ago

Yes, In TIMIT document, I found the list of coretest, from the whole TEST, I just selected that coretest files.

ucasiggcas commented 5 years ago

@MayMiao0923 大佬，他这句话啥意思？他也是从TIMIT/TEST中选择的一些？还是随机选择的wav文件？

我觉得他要是自己创建的就应该分享出来，既然开源了，就不要制造一些trick，给别人增加复现的困难这是最令人不爽的。

MayMiao0923 commented 5 years ago

If I am not mistaken, you are selecting the coretest file from TEST according to Table 1 in the TESTTEST file in the DOC folder. Is that correct? 捕获15 And when running make_test_noisy.m, the value of k should be determined by the statement "parfor k = 1:1:length(timit_list)" instead of "parfor k = 1:1:10". 捕获12 @jtkim-kaist