Closed felixkreuk closed 3 years ago
Another paper, WaveGrad, optimizes the L1 loss instead of L2 loss. They observed better stability and in my case I found the L1 objective gave slightly better results.
There was no theoretical justification for choosing the L1 over L2 loss in this case. It was an empirical decision, and the L2 loss will also suffice.
Thank you.
Hi,
First, I would like to thank you for your implementation. I have a question regarding the optimization. In the original paper the authors propose to minimize the L2 distance between the noise and network output. I noticed that the code uses the L1 loss, is there a reason for that change?
Thank you, Felix