Closed mim closed 7 years ago
Uploaded the codes:
test_mask.py This one is for predicting the mask using LSTM model, and saving the mask in dot mat files.
stubI_LSTMMessl2.m This is for combining lstm mask and MESSL mask, you have to specify the lstm_dir, which is the directory lstm masks are stored in, and combineOpt, which is the way to combine the masks (average, max, min)
Need to test the code, waiting for the GPU to become available.
MESSL masks are here: /scratch/mim/chime3/messlMcMvdrMrf.2Hard5Lbp4Slate/
For dt05_simu evaluation:
the average SDR is 0.811886 the average OPS is 54.909057 the average SIR is Inf the average ISR is 1.101032 the average SAR is 18.400806 The link for the spreadsheet: Evaluation
Thanks. Two things:
Okay. The reference for dt05_simu I used is the MVDR output audio files. They are stored in "/home/data/CHiME3/data/audio/16kHz/local/messl-mvdr-output/wav/"
The estimate files are in "/scratch/near/replayMessl/wav". I listened to one audio, it's interesting that the left ear's noise has been reduced, but the right ear's has not. I will check the mask*spectrogram part in the code.
The first channel is the output of the beamformer with the mask-based post-filter applied. The second channel is just the output of the beamformer with no post-filter. You should only run PEASS on one of the channels at a time. Use the first channel. I'll modify mvdrSoudenMulti not to do this, because it's caused a bunch of problems.
It seems modifying PEASS code by only using the first channel is not so easy... So I just wait for your modification and rerun it again.
Ok, in afffb8c1fc47a6860bedf0f8e36ca88a453a9ba4 I removed the extra channels. There should just be one channel in the output of stubI_replayMessl
(although I haven't tested it yet).
The reference of simulated data should be the MVDR (without channel 0) output of audio recorded in booth, right? How can I get the MVDR result without using channel 0?
For training data, the reference for simulated data should be the original wsj0 files, which are in tr05_org. For the dev and eval sets, yes, use the supervised mvdr of the booth files. Those use channel 0 for controlling the mvdr. So I'm not sure I understand your question. In stubI_supervisedMvdr, you can use the includeRef argument to control whether channel 0 is used as an input to the beamforming or just used to control the beamforming.
In reading through the PEASS code, it appears that we are using it incorrectly in several ways. First, you need to supply all of the reference files that created the mixture, both speech and noise. Second, it seems like it probably requires the multichannel versions of the original speech and noise (separately). For the simulated recordings, you can use the booth speech and the mixture minus the booth speech to recover the noise. Third, it might require a multichannel output. Try it with the single channel output and see what happens. Also, listen to the intermediate separations it generations for the different components (signal, interference, artifacts). If it throws an error, then try BSS_EVAL instead, which might be able to work in this case (multi-channel references, single-channel output).
Run the LSTM from #4 on the same mixture as MESSL is run on. Combine the masks by averaging them together after MESSL is done running.