wangkenpu / rsrgan

Robust Speech Recognition Using Generative Adversarial Networks (GAN)
MIT License
58 stars 16 forks source link

preparing dataset #7

Closed tingyang01 closed 4 years ago

tingyang01 commented 4 years ago

I was wonderful to know that this repository gives considerable speech reverberation result. So I've tried to train reverb model with LibiSpeech corpus. I've all done making reverberation wave dataset. But I dont well know how I can make mfcc and lps dataset. I've used the script in calculation of mfcc like this: """ if [ $stage -le 6 ]; then for part in dev_clean test_clean dev_other test_other train_clean_100; do steps/make_mfcc.sh --cmd "$train_cmd" --nj 40 data/$part exp/make_mfcc/$part $mfccdir steps/compute_cmvn_stats.sh data/$part exp/make_mfcc/$part $mfccdir done fi """

and i've replaced "compute-mfcc-feats" as "compute-spectrogram-feats" for LPS. Then I am trying to use cmvn_train.ark cmvn_train_reverb.ark of "mfcc " directory as labels.cmvn and inputs.cmvn. but I've got following error on stage 0. """ Prepare tr and cv data Make Numpy format Global CMVN file ... Convert data/train/inputs.cmvn and data/train/labels.cmvn to Numpy format Input .ark file is not binary """

I dont well know also how I can make cv.list, train.list. It would be greatly appreciated if you could tell me how to make it. Thanks Ting

wangkenpu commented 4 years ago

You can find the details in my code about converting Kaldi binary format CMVN to numpy format. By the way, this code just support python2.X.

So I think this issue is likely because your CMVN file is not binary file. Maybe you should check your Kaldi command on CMVN generation.

tingyang01 commented 4 years ago

Thanks for your reply. may i convert simply .ark into binary format? I think .ark file contains all cmvn features for all audio file. your python file seems an parser file for only the cmvn file of an audio file. RIght?

tingyang01 commented 4 years ago

Could you share sample CMVN file?

wangkenpu commented 4 years ago

Thanks for your reply. may i convert simply .ark into binary format? I think .ark file contains all cmvn features for all audio file. your python file seems an parser file for only the cmvn file of an audio file. RIght?

I think I have known your issue. For speech enhancement task, we'd better use global CMVN, but you used utterance-level CMVN. This is why you can't convert you CMVN file to Numpy format. I think you can refer to Kalid document and learn how to compute global CMVN.

By the way, if you use utterance-level CMVN, when you want convert the normalized LPS to original LPS, how can you get the CLEAN utterance-level CMVN in the test stage. This is why we use global CMVN.

Could you share sample CMVN file?

I'm really sorry to this. Because there are not some suitable audios in hand to extract LPS/MFCC and compute their CMVN.