Confusion of the effectiveness of IB

Opdoop commented 1 year ago

From [1], we have:

$$\mathcal{L}=\mathcal{L}{\mathrm{CE}}+\beta \cdot D{\mathrm{KL}}[P(\mathbf{T} \mid \mathbf{X}) | Q(\mathbf{T})]$$

While comparing to conventional adversarial training, from [2], we have: $$\min \left[\underset{x, y \sim p{\mathcal{D}}}{\mathbb{E}}\left[\max {\hat{x} \in \mathbb{B}(x)} \mathcal{L}(x, \hat{x}, y)\right]\right]$$

where above objective can be specified in semi-supervised fashion as: $$-\log q\left(y \mid F{s}(x)\right)+\beta \mathrm{KL}\left(q\left(\cdot \mid F{s}(x)\right) | q\left(\cdot \mid F_{s}(\hat{x})\right)\right)$$

IB principle and adversarial training(AT) both introduce a regularization term to smooth the landscape of model. The only difference is distribution term used KL distance, variational IB uses Gaussian while AT uses adversarial examples.

Adversarial examples can be viewed as a special out-of-distribution. In this view, compare with IB, AT should be a tighter bound for OOD optimization. But from your experiment results, IB surpasses all previous AT-liked methods. How could a loose bound be better than a tighter bound? This really confused me. Is there something I misunderstood?

[1] Improving the Adversarial Robustness of NLP Models by Information Bottleneck [2] How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

zhangcen456 commented 1 year ago

Our method is supposed to filter out features that can be easily affected by attack methods from the input, which makes the models more robust, while adversarial training adds perturbation to the embedding of the input to generate extra examples. We believe that's the main difference between our method and AT. And for adversarial training, new adversarial examples can still be generated for adversarially trained networks.[1] [1]Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

Opdoop commented 1 year ago

Thanks for your reply. I understand the differences between your method and AT, in terms of both motivation and technical solution.

As I described above, what I can see from the optimization object is that the only difference is the distribution term used KL distance, variational IB uses Gaussian while AT uses adversarial examples.

My question is why the robustness performance of IB could beat AT? Any insight or further explanation is welcome.

zhangcen456 commented 1 year ago

Different adversarial examples may not belong to the same distribution (new adversarial examples can still be generated against the model after adversarial training), so adversarial training is not necessarily a tighter bound.

Opdoop commented 1 year ago

Yes, indeed Different adversarial examples may not belong to the same distribution (new adversarial examples can still be generated against the model after adversarial training). But based on this argument, what you're saying can translate to generally Gaussian restriction better than AT? What's the assumption to support it?

Let's try to go a step further. If we accept this argument, then what do you think could be the criteria to design a better bound?

zhangcen456 / IB

Confusion of the effectiveness of IB #1