Confusion about MoE fusion

thomassutter / MoPoE

Code release for ICLR 2021 paper "Generalised Multimodal ELBO"

33 stars 10 forks source link

Thanks for producing this interesting paper. I have a confusion about the MoE part and wonder if you can clarify it.

In the paper, Eq (6) is the objective to be optimized, which involves KL(sum_i expert_i || prior). Suppose expert_i is Gaussian, how do you compute the KL between a mixture-of-gaussian and the prior? I don't think this has a closed form.

I tried to find the answer in the code, and came down to moe_fusion and mixture_component_selection. It seems to be performing some sort of sampling. Is this the same as the importance sampling in MMVAE (Shi 2019)?

Any clarification would be much appreciated. Thank you.

thomassutter / MoPoE

Confusion about MoE fusion #3