speechLabBcCuny / messlJsalt15

MESSL wrappers etc for JSALT 2015, including CHiME3
7 stars 8 forks source link

Test result of the masks #25

Open nateanl opened 6 years ago

nateanl commented 6 years ago

I tested the stubI_MesslKeras code on 7 dt05_str_real files. Here are the results.

nateanl commented 6 years ago

The comparations among different combinations of the masks.

speech_mask

noise_mask

post_filter

spectrogram

nateanl commented 6 years ago

Here are all channels for the LSTM masks. lstm_masks

mim commented 6 years ago

Channels 3-6 are very similar to each other, while channels 1 and 2 are quite different. Any idea why?

nateanl commented 6 years ago

The speech in channel 2 is hidden. I can barely listen to the speaker. I checked the failure array, none of the microphones is failed. You can listen to them: /home/data/CHiME3/data/audio/16kHz/isolated/dt05_str_real/F01_050C0108_STR.CH2.wav /home/data/CHiME3/data/audio/16kHz/isolated/dt05_str_real/F01_050C0107_STR.CH2.wav /home/data/CHiME3/data/audio/16kHz/isolated/dt05_str_real/F01_050C0106_STR.CH2.wav /home/data/CHiME3/data/audio/16kHz/isolated/dt05_str_real/F01_050C0103_STR.CH2.wav /home/data/CHiME3/data/audio/16kHz/isolated/dt05_str_real/F01_050C010B_STR.CH2.wav /home/data/CHiME3/data/audio/16kHz/isolated/dt05_str_real/F01_050C010A_STR.CH2.wav /home/data/CHiME3/data/audio/16kHz/isolated/dt05_str_real/F01_050C0102_STR.CH2.wav

nateanl commented 6 years ago

Because the energy of the noise is very high, we can't exclude the channel in mask combination and beamforming. Any suggestion on this case?

mim commented 6 years ago

I don't understand what you mean. We can use all of the channels, we want to use all of the channels.

On Mar 11, 2018 11:16 PM, "Zhaoheng Ni" notifications@github.com wrote:

Because the energy of the noise is very high, we can't exclude the channel in mask combination and beamforming. Any suggestion on this case?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/speechLabBcCuny/messlJsalt15/issues/25#issuecomment-372183112, or mute the thread https://github.com/notifications/unsubscribe-auth/AALbHt_5nPJJFb5sJqW829Vr8Kq4Ohlnks5tdeipgaJpZM4Slii8 .

nateanl commented 6 years ago

I was thinking if the speech is hidden in that microphone, we should find some way to detect it and make it fail. Maybe I was wrong. I can check the IAF mask to see the difference.

mim commented 6 years ago

In that case the model should predict a mask of all 0s. We shouldn't have to do anything differently to it.

On Mon, Mar 12, 2018 at 10:07 AM, Zhaoheng Ni notifications@github.com wrote:

I was thinking if the speech is hidden in that microphone, we should find some way to detect it and make it fail. Maybe I was wrong. I can check the IAF mask to see the difference.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/speechLabBcCuny/messlJsalt15/issues/25#issuecomment-372321629, or mute the thread https://github.com/notifications/unsubscribe-auth/AALbHsD8qdPctsqwCl6vvJlnoflLYl-0ks5tdoE5gaJpZM4Slii8 .

nateanl commented 6 years ago

These are the speech mask, noise mask, and the post-filter generated from 6-channel LSTM masks for one utterance. speech mask and noise mask: 119016 out of 119529 points are different. speech mask and post-filter: 119249 out of 119529 points are different. noise mask and post-filter: 119249 out of 119529 points are different. lstm_masks

nateanl commented 6 years ago

Sorry, I guess you want to know if the masks of channel 3-6 are the same. I compared them respectively, and they are different too.

mim commented 6 years ago

Ok, I guess they are very slightly different. Let's see what we get running them through the system.