Adversarial Attacks and Testing the Robustness of Models

abheesht17 commented 2 years ago

Branching off from the issue which @aflah02 opened a few weeks ago, https://github.com/keras-team/keras-nlp/issues/39:

Is the KerasNLP team interested in implementing adversarial attacks? We could start off with simple attacks on classification models.

I understand if this is a bit broad, and the team may want to integrate it later to the repository, especially because we may need some augmentation APIs. For example, some adversarial attacks may want to perturb only those words which are assigned a higher importance score by the model. For perturbation, we can leverage the augmentation APIs.

A good resource is https://github.com/QData/TextAttack.

chenmoneygithub commented 2 years ago

@abheesht17 Thanks for opening this feature request!

Yes, having an adversarial attack system would be nice for model evaluation. Our current problem is that we do not have pretrained model available. When you start working on this, would you mind sharing a colab so that we can do some early reviews on the interface? Thanks!

abheesht17 commented 2 years ago

Sure, @chenmoneygithub! Will do. Waiting for some augmentation methods to be implemented before starting adversarial attacks.

mattdangerw commented 1 year ago

This probably does not quite fit with our current priorities.

keras-team / keras-hub

Adversarial Attacks and Testing the Robustness of Models #95