vadel / AudioUniversalPerturbations

12 stars 6 forks source link

Universal Adversarial Examples for Speech Command Classification

This is the code corresponding to the paper "Universal Adversarial Examples for Speech Command Classification", by Jon Vadillo and Roberto Santana.

Introduction

In this work we address the generation of universal adversarial perturbations for Speech Command Classification. The attack approach used to create the universal perturbations (hereinafter referred to as UAP-HC algorithm) consists on a Hill-Climbing reformulation of the state-of-the-art method proposed by Moosavi-Dezfooli et al. in [3].

We encourage the reader to listen to some of the adversarial examples generated, accessible in our web.

Dataset

We used the Speech Command Dataset [1] (2nd version) as testbed to test our approach. This dataset consists on aset of WAV audio files of 30 different spoken commands. The duration of all the files is fixed to 1 second, and the sample-rate is 16kHz in all the samples, so that each audio waveform is composed by 16000 values, in the range [−2^15 , 2^15]. In the paper, we used a subset of ten classes, those standard labels selected in previous publications: ”Yes”, ”No”, ”Up”, ”Down”, ”Left”, ”Right”, ”On”, ”Off”, ”Stop”, and ”Go”. In addition tothis set, consider ed two special classes: ”Unknown” (a command different to the ones specified before) and ”Silence” (no speech detected).

Target models

We selected two different target models, based on the CNN structure proposed in [2] for small-footprint keyword spotting, as it is shown in the following figure:

CNN structure of the target models

Target Model A

This model achieves an an 85.2% Top One Accuracy on the test set of the Speech Command Dataset (2nd version). The training procedure is based on the code provided in link. However, there are some modifications in order to ensure that the process is end-to-end differentiable, including the MFCC transformation.

Target Model B

This is a pretrained model accesible from this link. It obtained an 82.5% Top One Accuracy on the test set of the Speech Command Dataset (2nd version).

User guide

The implementation has been implemented in Python 3.6.

Generating universal perturbations

The script "Universal_perturbations_multi.py" contains the implementation of the algorithm used for the generation of universal adversarial perturbations. The program takes as input the following parameters:

The program will return the following information:

References

[1] Peter Warden, “Speech commands: A dataset for limited-vocabulary speechrecognition,”arXiv preprint arXiv:1804.03209, 2018.

[2] T. N. Sainath and C. Parada, “Convolutional neural networks for small-footprint keyword spotting,” in Sixteenth Annual Conference of theInternational Speech Communication Association, 2015.

[3] S. M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal adversarial perturbations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1765–1773.