A repository that implements the projected sinkhorn algorithm, and applies it towards generating Wasserstein adversarial examples. Created by Eric Wong, and joint work with Frank R. Schmidt and Zico Kolter. See our paper on arXiv here.
lambertw
function (scipy documentation). You can install this these functions with
pip install projected_sinkhorn
. The package contains the following functions:
projected_sinkhorn(X, Y, C, epsilon, lam, verbose=False, plan=False, objective='2norm', maxiters=50, return_objective=False)
computes the projection of Y
onto the Wasserstein ball around X
. conjugate_sinkhorn(X, Y, C, epsilon, lam, verbose=False, plan=False, objective='2norm', maxiters=50, return_objective=False)
computes the support function (conjugate) of the Wasserstein ball. wasserstein_cost(X, p=2, kernel_size=5)
creates a cost matrix for the p-Wasserstein distance for a given kernel size. lambertw(z0, tol=1e-5)
computes the lambertw function of z0
on the zero branch. The code is a direct port of the zero branch of the scipy version. While much work in adversarial examples research has focused on norm-bounded perturbations, these types of perturbations largely ignore structure that we typically believe to exist in the data. For example, in images we can consider transformations such as translations, rotations, or distortions to be small, adversarial changes, yet these types of transformations can be extremely large when measured with respect to some p-norm. This work represents a step towards describing convex perturbation regions: convex sets of allowable perturbations beyond the norm-ball, which can capture structure or invariants in the application domain.
To this end, we propose adversarial examples which are close in Wasserstein distance. For images, this has an interpretation of moving pixel mass: two images that are close in Wasserstein distance require moving only a small amount of pixel mass a small distance to transform one image to the other. Examples of image transformations that are small in Wasserstein distance include rotations and translations. In practice, we find that adversarial examples generated within this ball have perturbations that reflect the actual content and structure of the image itself. For example, in the following figure we can see a Wasserstein perturbation on the top row, which doesn't attack the empty space around the six, vs an l-infinity perturbation on the bottom row, which attacks all pixels indiscriminately.
We derived a fast, modified sinkhorn iteration that solves the projection problem onto the Wasserstein ball, and restrict our transport plans to local regions to make this tractable for image datasets. The resulting algorithm is fast enough to be run as a subroutine within a PGD adversary, and furthermore within an adversarial training loop. For CIFAR10 classifiers, we find that an adversarial radius of 0.1 is enough to fool the classifier 97% of the time (equivalent to allowing the adversary to move 10\% of the mass one pixel), when restricted to local 5 by 5 transport plans. The main experimental results in the paper can be summarized in the following table.
CIFAR10 Acc | CIFAR10 Adv Acc (eps=0.1) | MNIST Acc | MNIST Adv Acc (eps=1.0) | |
---|---|---|---|---|
Standard | 95% | 3% | 99% | 4% |
l-inf robust | 66% | 61% | 98% | 48% |
Adv training | 81% | 76% | 97% | 86% |
Binarization | - | - | 99% | 14% |