speechLabBcCuny / messlJsalt15

MESSL wrappers etc for JSALT 2015, including CHiME3
7 stars 7 forks source link

LSTM-MESSL combination 2 #5

Closed mim closed 7 years ago

mim commented 8 years ago

Run the LSTM from #4 on the same mixture as MESSL is run on. Combine the masks by averaging them together after MESSL is done running.

nateanl commented 7 years ago

Uploaded the codes:

test_mask.py This one is for predicting the mask using LSTM model, and saving the mask in dot mat files.

stubI_LSTMMessl2.m This is for combining lstm mask and MESSL mask, you have to specify the lstm_dir, which is the directory lstm masks are stored in, and combineOpt, which is the way to combine the masks (average, max, min)

Need to test the code, waiting for the GPU to become available.

mim commented 7 years ago

MESSL masks are here: /scratch/mim/chime3/messlMcMvdrMrf.2Hard5Lbp4Slate/

nateanl commented 7 years ago

For dt05_simu evaluation:

the average SDR is 0.811886 the average OPS is 54.909057 the average SIR is Inf the average ISR is 1.101032 the average SAR is 18.400806 The link for the spreadsheet: Evaluation

mim commented 7 years ago

Thanks. Two things:

  1. Please put these numbers in the spreadsheet and post a link to the spreadsheet here.
  2. I'm a bit suspicious of these numbers since the ops is so high, but the SDR is so low. I could also expect the isr to be infinite, because it deals with spatial information, which I don't think is present in the reference or estimate, but not the sir. What did you use for the reference?
nateanl commented 7 years ago

Okay. The reference for dt05_simu I used is the MVDR output audio files. They are stored in "/home/data/CHiME3/data/audio/16kHz/local/messl-mvdr-output/wav/"

The estimate files are in "/scratch/near/replayMessl/wav". I listened to one audio, it's interesting that the left ear's noise has been reduced, but the right ear's has not. I will check the mask*spectrogram part in the code.

mim commented 7 years ago

The first channel is the output of the beamformer with the mask-based post-filter applied. The second channel is just the output of the beamformer with no post-filter. You should only run PEASS on one of the channels at a time. Use the first channel. I'll modify mvdrSoudenMulti not to do this, because it's caused a bunch of problems.

nateanl commented 7 years ago

It seems modifying PEASS code by only using the first channel is not so easy... So I just wait for your modification and rerun it again.

mim commented 7 years ago

Ok, in afffb8c1fc47a6860bedf0f8e36ca88a453a9ba4 I removed the extra channels. There should just be one channel in the output of stubI_replayMessl (although I haven't tested it yet).

nateanl commented 7 years ago

The reference of simulated data should be the MVDR (without channel 0) output of audio recorded in booth, right? How can I get the MVDR result without using channel 0?

mim commented 7 years ago

For training data, the reference for simulated data should be the original wsj0 files, which are in tr05_org. For the dev and eval sets, yes, use the supervised mvdr of the booth files. Those use channel 0 for controlling the mvdr. So I'm not sure I understand your question. In stubI_supervisedMvdr, you can use the includeRef argument to control whether channel 0 is used as an input to the beamforming or just used to control the beamforming.

mim commented 7 years ago

In reading through the PEASS code, it appears that we are using it incorrectly in several ways. First, you need to supply all of the reference files that created the mixture, both speech and noise. Second, it seems like it probably requires the multichannel versions of the original speech and noise (separately). For the simulated recordings, you can use the booth speech and the mixture minus the booth speech to recover the noise. Third, it might require a multichannel output. Try it with the single channel output and see what happens. Also, listen to the intermediate separations it generations for the different components (signal, interference, artifacts). If it throws an error, then try BSS_EVAL instead, which might be able to work in this case (multi-channel references, single-channel output).