tolstikhin / wae

Wasserstein Auto-Encoders
BSD 3-Clause "New" or "Revised" License
505 stars 90 forks source link

Question about MMD implementaton #2

Open hiwonjoon opened 6 years ago

hiwonjoon commented 6 years ago

Thanks for sharing a code with your amazing paper! I really enjoyed reading it.

Anyway, I am interested in extending your work in other direction, and I come up with a question on MMD part. I was able to understand the overall concept, but not sure on this multi-scale part.

https://github.com/tolstikhin/wae/blob/068a25753d55c7dd3d130836702199bf59959c84/wae.py#L294

Are you just trying multiple kernels to get a better estimate of MMD?

It would be also very nice of you to recommend some readings to get a better understanding of MMDS.

tolstikhin commented 6 years ago

Dear Wonjoon,

thank you for asking. The property we are using here is that the sum of positive definite kernels is also a positive definite kernel. We were initially using IMQ kernel with one fixed width parameter, but noticed it works slightly better if you sum those kernels with a range of widths, which allows the kernel to simultaneously "look at various scales". This is a bit hand-wavy, but I hope it gives you a correct intuition.

Regarding MMDs in general, I can recommend you looking into this overview https://arxiv.org/pdf/1605.09522.pdf

Best wishes, Ilya

hiwonjoon commented 6 years ago

Thanks for an instant response! So, a sigma of the kernel is not related to a prior distribution's sigma. Is it correct?

tolstikhin commented 6 years ago

Correct, these are two different things. But you may want to choose the kernel width depending on your prior.

ttgump commented 5 years ago

Thanks to the great discussion. I have a question. When I am using the MMD penalty, I trained my WAE model on some other datasets (not MNIST or celebA), I saw the MMD would become a negative value after training hundreds of epochs. Is it possible to have negative MMD penalty?

tolstikhin commented 5 years ago

The penalty used in WAE-MMD is not precisely the population MMD, but a sample-based U-statistic. Being an unbiased statistic (that is, its expected value coincides with the quantity of interest --- MMD in this case), if the population MMD is zero, it necessarily needs to take negative values from time to time. In summary, yes, negative values are OK.

ttgump commented 5 years ago

Thanks for the explanation! Should we consider the MMD has been converged after meet negative values? So when MMD is negative, can we consider q(z|x) is equal to the prior p(z)?

tolstikhin commented 5 years ago

Dear ttfump,

q(z|x) is not being matched to p(z) in WAE. Instead, the aggregate posterior is.