thomassutter / MoPoE

Code release for ICLR 2021 paper "Generalised Multimodal ELBO"
33 stars 10 forks source link

Confusion about MoE fusion #3

Open mistycheney opened 1 year ago

mistycheney commented 1 year ago

Thanks for producing this interesting paper. I have a confusion about the MoE part and wonder if you can clarify it.

In the paper, Eq (6) is the objective to be optimized, which involves KL(sum_i expert_i || prior). Suppose expert_i is Gaussian, how do you compute the KL between a mixture-of-gaussian and the prior? I don't think this has a closed form.

I tried to find the answer in the code, and came down to moe_fusion and mixture_component_selection. It seems to be performing some sort of sampling. Is this the same as the importance sampling in MMVAE (Shi 2019)?

Any clarification would be much appreciated. Thank you.

thomassutter commented 1 year ago

Hi, thanks for your interest in our work.

We optimize an upper bound by taking the sum of Kl divergences of the individual experts and the prior distribution. For the sampling, we subsample the samples per batch based on the mixture weights. If we assume equal weights for all experts, it boils down to reconstructing (batchsize / #experts) samples from every expert. I hope this clarifies things for you. Otherwise, feel free to reach out again.

Best regards, Thomas