locuslab / projected_sinkhorn

86 stars 14 forks source link

Wasserstein Examples via Projected Sinkhorn Iterates

A repository that implements the projected sinkhorn algorithm, and applies it towards generating Wasserstein adversarial examples. Created by Eric Wong, and joint work with Frank R. Schmidt and Zico Kolter. See our paper on arXiv here.

News

What is in this repository?

Installation & Usage

You can install this these functions with pip install projected_sinkhorn. The package contains the following functions:

Why do we care about Wasserstein adversarial examples?

While much work in adversarial examples research has focused on norm-bounded perturbations, these types of perturbations largely ignore structure that we typically believe to exist in the data. For example, in images we can consider transformations such as translations, rotations, or distortions to be small, adversarial changes, yet these types of transformations can be extremely large when measured with respect to some p-norm. This work represents a step towards describing convex perturbation regions: convex sets of allowable perturbations beyond the norm-ball, which can capture structure or invariants in the application domain.

To this end, we propose adversarial examples which are close in Wasserstein distance. For images, this has an interpretation of moving pixel mass: two images that are close in Wasserstein distance require moving only a small amount of pixel mass a small distance to transform one image to the other. Examples of image transformations that are small in Wasserstein distance include rotations and translations. In practice, we find that adversarial examples generated within this ball have perturbations that reflect the actual content and structure of the image itself. For example, in the following figure we can see a Wasserstein perturbation on the top row, which doesn't attack the empty space around the six, vs an l-infinity perturbation on the bottom row, which attacks all pixels indiscriminately.

We derived a fast, modified sinkhorn iteration that solves the projection problem onto the Wasserstein ball, and restrict our transport plans to local regions to make this tractable for image datasets. The resulting algorithm is fast enough to be run as a subroutine within a PGD adversary, and furthermore within an adversarial training loop. For CIFAR10 classifiers, we find that an adversarial radius of 0.1 is enough to fool the classifier 97% of the time (equivalent to allowing the adversary to move 10\% of the mass one pixel), when restricted to local 5 by 5 transport plans. The main experimental results in the paper can be summarized in the following table.

CIFAR10 Acc CIFAR10 Adv Acc (eps=0.1) MNIST Acc MNIST Adv Acc (eps=1.0)
Standard 95% 3% 99% 4%
l-inf robust 66% 61% 98% 48%
Adv training 81% 76% 97% 86%
Binarization - - 99% 14%