mfederici / Multi-View-Information-Bottleneck

Implementation of Multi-View Information Bottleneck
125 stars 17 forks source link

Derivation of loss MIB #7

Closed FengxinLee closed 2 years ago

FengxinLee commented 2 years ago

Hello. Nice work! I'm interested in the derivation of the loss function in appendix F. Just two small questions: (1) How to get the second term of equation (6)? Does it use the chain rule (P3)? (2) And I failed to understand the derivation of the first term, particularly the first step. Could you give more details or other works to follow? Looking forward to your reply.

mfederici commented 2 years ago

Hi, thanks for expressing interest in our work and for the relevant questions. We identified a couple of typos in appendix F that will be corrected in a future revision of the paper.

1) One step of the derivation is not reported in the appendix. Namely minimizing I(v_1;v_2|z_1) is equivalent to maximizing I(z_1;v_2): min_theta I(v_1;v_2|z_1)= I(v_1;v_2)-max_theta I(v_2;z_1). From there the last part of the derivation is reported at the bottom of the page.

2) the key point in the derivation for the first term is that we use the second encoder p_psi(z_2|v_2) as a variational approximation for p(z_1|v_2):=/int p(v_1|v_2)p_theta(z_1|v_1) d v_1. This is possible only because z_1 and z_2 are defined on the same space. Note that the variational gap in the second to last step of the derivation should be KL(p(z_1|v_2)||p_psi(z_2|v_2)) instead of KL(p(z_2|v_1)||p_psi(z_2|v_2)). The first step in the derivation consist in multiplying and dividing by p_psi(z_2|v_2).

I hope this helps to clarify. If not, feel free to follow up with further questions :)

FengxinLee commented 2 years ago

Thank you for response. I'm looking forward to following further works.