Closed qiansichong closed 3 years ago
Thanks for question. I have similar observations. I suppose this is reasonable. When you learning the density prior on only mixtures of two sources, density of the mixture itself does not have the diversity to learn good enough speech models, i.e., the problem is not challenging enough. Ideally, we want the number of mics are large enough so that the initial mixtures sounds like bubble noises. This will force the model to learn some real stuff.
Let's consider the extreme case, the number of mic is 1. Then, it is clear that no meaningful information can be learned.
I have a question, when the number of microphones is set to 4 or 5, the separation performance of the trained model is normal, but when it is set to 2, the separation performance is very poor