robust-ml / robust-ml.github.io

A community-run reference for state-of-the-art adversarial example defenses.
https://www.robust-ml.org/
Creative Commons Attribution Share Alike 4.0 International
49 stars 7 forks source link

Provable Robustness of ReLU networks via Maximization of Linear Regions #4

Closed max-andr closed 5 years ago

max-andr commented 5 years ago

Name: Provable Robustness of ReLU networks via Maximization of Linear Regions

Authors: Francesco Croce, Maksym Andriushchenko, Matthias Hein

Paper: https://arxiv.org/abs/1810.07481

Code: https://github.com/max-andr/provable-robustness-max-linear-regions

Venue: AISTATS 2019

Does the code implement the robust-ml API and include pre-trained models: yes

Dataset: MNIST

Threat model: Linf (ϵ=0.1)

Natural accuracy: 98.81% accuracy on the full test set

Claims: 96.4% provable accuracy on the first 1000 test points

anishathalye commented 5 years ago

Thank you for the submission! Added in bdf79d1ec6.

Please let us know if you run the verification procedure over the full test set and have an updated number.

max-andr commented 5 years ago

Ok, now we have results for the full test set of MNIST:

Upper bound on robust accuracy (empirical): 96.42% Lower bound on robust accuracy (provable): 96.37%

And would it be possible to indicate in the table the distinction between empirical and provable robustness? (ideally by adding two numbers to the corresponding cell)

E.g. "Provable defenses against adversarial examples via the convex outer adversarial polytope (Wong & Kolter)" also have provable accuracy (somewhat implied by the title :-) ), but it's not indicated there. Just to distinguish between the numbers that can be potentially reduced (upper bounds on adv. accuracy) and the numbers for which any reduction would mean a mistake in their proof / code (lower bounds on adv. accuracy).

max-andr commented 5 years ago

And also we would like to add the 2 following claims (#5 is related since probably you would need to first add these two datasets to the robustml framework):

Dataset: Fashion-MNIST Threat model: Linf (ϵ=0.1) Natural accuracy: 85.50% accuracy on the full test set Claims: (both on the first 1000 test points) 73.4% empirical robust accuracy and 69.3% provable robust accuracy

Dataset: GTS (German Traffic Sign dataset) Threat model: L2 (ϵ=0.2) Natural accuracy: 84.65% accuracy on the full test set Claims: (both on the first 1000 test points) 67.9% empirical robust accuracy and 66.8% provable robust accuracy

anishathalye commented 5 years ago

Both of these sound good. I can add the provable bound soon. And we can add Fashion-MNIST/GTS once those datasets are included in robustml.

max-andr commented 5 years ago

I have also added the corresponding scripts with robustml interface for MNIST/FMNIST/GTS to the repository of our paper: https://github.com/max-andr/provable-robustness-max-linear-regions/tree/master/robustml

anishathalye commented 5 years ago

Apologies for the delay on this -- I should get around to merging all the things by mid next week.

anishathalye commented 5 years ago

Okay, we have merged the FMNIST/GTS pull request, so now we can make these changes.

As for formatting this in the table, do you have thoughts on what is a good way to do it? So far, we don't have any defenses in the list with more than one claim (/ dataset), so this hasn't come up before.

Should we have one row in the table per claim (/dataset)? This would result in a little bit of duplication (the first two columns), but that might be acceptable.

max-andr commented 5 years ago

Okay, we have merged the FMNIST/GTS pull request, so now we can make these changes.

As for formatting this in the table, do you have thoughts on what is a good way to do it? So far, we don't have any defenses in the list with more than one claim (/ dataset), so this hasn't come up before.

Should we have one row in the table per claim (/dataset)? This would result in a little bit of duplication (the first two columns), but that might be acceptable.

Thanks! I also think that if you add all 3 datasets as basically separate claims, it would look a bit cluttered and redundant. In my opinion, the best representation would be to have a single Defense and Venue entry, but then to split the cells Dataset, Threat Model, Natural Accuracy and Claims onto 3 subcells. What do you think about this?

anishathalye commented 5 years ago

Good idea! See #8 for the proposed change, and let us know what you think.

anishathalye commented 5 years ago

Addressed by #8. Thanks again for your feedback and contributions.

max-andr commented 5 years ago

Thanks a lot for adding the results!

There is a minor typo - For GTS the threat model is actually L2 (ϵ=0.2), not Linf.

anishathalye commented 5 years ago

Oops, fixed.