Open hughsalimbeni opened 6 years ago
Indeed many attacks are based on gradients, but usually that means for classification, you just need the logit vector before softmax, then automatic differentiation will work it out for you (if you use tensorflow or pytorch). I can definitely help if you want since I already have some code to do it.
There are some ideas to attack Bayesian methods that actually require more than the logit vector. This might be slightly involved to implement...
Thanks @YingzhenLi! Certainly we want to use autodiff, but what I'm unsure of is whether the generic testing code should take the gradients itself, or whether we should just require all models to implement something like
def grad_logp(self, x):
"""
The gradient of the log predictive probabilities wrt x, a single input.
If x is shape (D,) then the output is shape (K, D), where K is the number of classes
"""
or
def grad_logp(self, X):
"""
The gradient of the log predictive probabilities wrt X, elementwise over the number of samples
If x is shape (N, D), then the output is shape (N, K, D), where K is the number of classes
"""
Do you know of any references for continuous output models?
Also not all models use the softmax (e.g. robust max and probit), so any evaluation should be agnostic to the link.
Some attacks do need logit (like Carlini-Wagner L2), although I suspect putting the values before robust max and probit might work.
Other attacks like FGSM/PGD/MIM only need the output probability vector.
My code looks something like the following:
def predict(self, X, softmax=True):
y = self.model(X) # value before sofmax
if softmax:
y = tf.nn.softmax(y)
return y
Then set softmax=True or False depending on the attack in use.
A key work in this area is https://github.com/YingzhenLi/Dropout_BBalpha
A problem with implementing this method here is that it needs model gradients. Either we could build a task that supports multiple backends (not ideal) and get the gradients directly, or the model could provide its own gradients which could be manipulated in numpy. @YingzhenLi any thoughts?