Spectral convergence loss, what does it measure?

PetrochukM commented 4 years ago

Hi There!

I'd like to better understand spectral convergence loss. In the literature, these are the mentions I have found so far:

SC loss emphasizes highly on large spectral components, which helps especially in the early phases of training. "Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks" https://arxiv.org/pdf/1808.06719.pdf

Because the spectral convergence loss emphasizes spectral peaks and the log STFT magnitude loss accurately fits spectral valleys "Probability Density Distillation with Generative Adversarial Networks for High-Quality Parallel Waveform Generation" https://arxiv.org/pdf/1904.04472.pdf

The above explanations for including the loss function are fairly vague and short. Furthermore, I am unable to find any mentions of a similar loss else were in literature. To better describe the loss, I searched for "relative spectral power" (f.y.i. since the spectrogram to the power of two is the "power spectrogram", and the sum of the power spectrogram is a "power spectral density").

Lastly, this paper from Google Brain "DDSP: DIFFERENTIABLE DIGITAL SIGNAL PROCESSING" https://arxiv.org/pdf/2001.04643.pdf just uses a spectrogram magnitude loss. It doesn't train with a spectral convergence loss and their results are pretty good.

Have you tried training without spectral convergence loss? Is there some more literature I am missing that validates it as a perceptual loss?

PetrochukM commented 4 years ago

hdmjdp commented 3 years ago

Have you try it without a spectral convergence loss?

stonelazy commented 2 years ago

Am not sure if this answer is going to be helpful now. Am just sharing my understanding of SpectralConvergence Loss and it may or may not answer the question raised by you.
SpectralConvergence loss is more related to spectral loss. Spectral loss is just the L1/L2 norm over the difference in the log magnitude of the spectrogram of the signal. Whereas, SCL is Euclidean distance between magnitude of target and denoised signals normalized by euclidean length of target signal.
Since its euclidean it penalises largely, when even one of the frequency range is far apart from the target and forgiving for smaller differences.

kan-bayashi / ParallelWaveGAN

Spectral convergence loss, what does it measure? #64