santi-pdp / segan

Speech Enhancement Generative Adversarial Network in TensorFlow
MIT License
808 stars 281 forks source link

Could this model be called a real GAN? The discriminator might contribute nothing to the performance. #71

Open ANYMS-A opened 5 years ago

ANYMS-A commented 5 years ago

Hi there, recently I'm trying to reproduce this SEGAN model and find out some questions.

The biggest question is about the loss function of the discriminator. As we know the original GAN's discriminator is doing binary classification task. So it use a Sigmoid at the last output layer and Binary Cross Entropy as the loss function. For this model's discriminator it seems it is doing a regression task, the loss function is trying to minimize the distance between outputs and 1 (or 0). So I think the discriminator contributes nothing to the final performance. minimizing L1 loss between clean speech and generated speech make the whole system work.

So I discarded the discriminator and only train the generator for speech enhancement, it gives a very close performance of SEGAN. If only use the generator for training, the model could be seen as a de-noising auto encoder.

3.I'm kind of confused about that how much does the discriminator contribute to the final performance during the Adversarial Process. Because for speech enhancement task, we are not 'generate' basically but 'mapping' noisy signal to clean signal.

Many thanks!

JUiscoming commented 4 years ago

I think gan loss contributes high-frequency band. without gan loss, mse loss or l1 loss don't catch enough high-freq information due to low-power of the high-freq.