hughsalimbeni / bayesian_benchmarks

A community repository for benchmarking Bayesian methods
Apache License 2.0
108 stars 37 forks source link

adversarial examples #2

Open hughsalimbeni opened 6 years ago

hughsalimbeni commented 6 years ago

A key work in this area is https://github.com/YingzhenLi/Dropout_BBalpha

A problem with implementing this method here is that it needs model gradients. Either we could build a task that supports multiple backends (not ideal) and get the gradients directly, or the model could provide its own gradients which could be manipulated in numpy. @YingzhenLi any thoughts?

YingzhenLi commented 6 years ago

Indeed many attacks are based on gradients, but usually that means for classification, you just need the logit vector before softmax, then automatic differentiation will work it out for you (if you use tensorflow or pytorch). I can definitely help if you want since I already have some code to do it.

There are some ideas to attack Bayesian methods that actually require more than the logit vector. This might be slightly involved to implement...

hughsalimbeni commented 6 years ago

Thanks @YingzhenLi! Certainly we want to use autodiff, but what I'm unsure of is whether the generic testing code should take the gradients itself, or whether we should just require all models to implement something like

def grad_logp(self, x):
    """
    The gradient of the log predictive probabilities wrt x, a single input. 
    If x is shape (D,) then the output is shape (K, D), where K is the number of classes 
    """

or

def grad_logp(self, X):
    """
    The gradient of the log predictive probabilities wrt X, elementwise over the number of samples
    If x is shape (N, D), then the output is shape (N, K, D), where K is the number of classes 
    """

Do you know of any references for continuous output models?

hughsalimbeni commented 6 years ago

Also not all models use the softmax (e.g. robust max and probit), so any evaluation should be agnostic to the link.

YingzhenLi commented 6 years ago

Some attacks do need logit (like Carlini-Wagner L2), although I suspect putting the values before robust max and probit might work.

Other attacks like FGSM/PGD/MIM only need the output probability vector.

My code looks something like the following:

def predict(self, X, softmax=True):
    y = self.model(X)    # value before sofmax
    if softmax:
        y = tf.nn.softmax(y)
    return y

Then set softmax=True or False depending on the attack in use.