Closed feng-bao-ucsf closed 3 years ago
Dear Feng, You are correct, in the example that we reported on our GitHub the two encoders (encorer_1 and encoder_2) completely share their parameters, as explained in the last paragraph of section 3.3. This is possible because the two views have the same marginal distribution p(v_1)=p(v_2).
If your two views of interest are different from each other, you can just initialize encoder_2 (with parameters psi) with a different architecture and add its parameters to the optimizer. In general, parameter sharing is beneficial for training faster and get a better gradient estimation, so if your two views have something in common you can also use partial parameter sharing.
I hope this helps to clarify your doubts. Best,
Marco
Dear Feng, You are correct, in the example that we reported on our GitHub the two encoders (encorer_1 and encoder_2) completely share their parameters, as explained in the last paragraph of section 3.3. This is possible because the two views have the same marginal distribution p(v_1)=p(v_2).
If your two views of interest are different from each other, you can just initialize encoder_2 (with parameters psi) with a different architecture and add its parameters to the optimizer. In general, parameter sharing is beneficial for training faster and get a better gradient estimation, so if your two views have something in common you can also use partial parameter sharing.
I hope this helps to clarify your doubts. Best,
Marco
Hi Marco, thanks for the clarification.
Dear MIB authors
Thank you for releasing this great and clear implementation. I have questions regarding to the definition of encoder_1 and encoder_2. From the code these two encoders share the same weights. But from the paper, these two were parameterized by theta and psi, respectively. Maybe I get it wrong. Can you help me with this?
Best, Feng