Closed yanghr closed 4 years ago
Thank you for expressing interest in our work and for the relevant question. Since the evaluation reported in this work considers the accuracy of a linear classifier, we can use the expectation of the posterior instead of considering a finite number of samples because of the linearity of the expectation: E[Wz]=W E[z]. Using a sampled version could be beneficial when the classifier z->y is not linear, especially if the posterior p(z_1|v_1) is a multimodal distribution.
Hi Marco,
Thanks a lot for your explanation! It answered my question perfectly. Looking forward to meeting you guys at ICLR!
Best, Huanrui
Sent from my iPhone
On Mar 2, 2020, at 04:42, Marco.Federici notifications@github.com wrote:
Thank you for expressing interest in our work and for the relevant question. Since the evaluation reported in this work considers the accuracy of a linear classifier, we can use the expectation of the posterior instead of considering a finite number of samples because of the linearity of the expectation: E[Wz]=W E[z]. Using a sampled version could be beneficial when the classifier z->y is not linear, especially if the posterior p(z_1|v_1) is a multimodal distribution.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/mfederici/Multi-View-Information-Bottleneck/issues/2?email_source=notifications&email_token=ACALRYY5KUKMAXK7FZRBEKDRFN5RVA5CNFSM4K7GMNW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENOT23I*issuecomment-593313133__;Iw!!OToaGQ!4p_7ETjdgdABHkAfa1_SBzA4F3FmiOtxH_qOtfGVpGk-wda3P86S8zFyZnlnMrH9K2U$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ACALRY345LGEMQY3W3B25I3RFN5RVANCNFSM4K7GMNWQ__;!!OToaGQ!4p_7ETjdgdABHkAfa1_SBzA4F3FmiOtxH_qOtfGVpGk-wda3P86S8zFyZnlnfWCy5RY$.
Great work! Just a small question. I noticed that when evaluating the classification accuracy of the learned feature, you are only using the "mean" of the encoder output to construct the dataset, rather than resampling the feature from the encoder distribution. Is this a common practice for information bottelneck encoder? Or do you have some specific reason for doing this? Thanks!