lixucuhk / adversarial-attack-on-GMM-i-vector-based-speaker-verification-systems

Implementation of Adversarial Attacks on GMM i-vector based Speaker Verification Systems (ICASSP2020) https://arxiv.org/abs/1911.03078
Apache License 2.0
33 stars 10 forks source link

adversarial-attack-on-GMM-i-vector-based-speaker-verification-systems

This repository provides the coding implementation of the paper: Adversarial Attacks on GMM i-vector based Speaker Verification Systems.

Adversarial Attack Configuration

Results

Adversarial audio samples & system responses

White box attacks

  1. FAR (%) of the GMM i-vector systems under white box attack with different perturbation degrees (P).
P=0 P=0.3 P=1.0 P=5.0 P=10.0
MFCC-ivec 7.20 82.91 96.87 18.14 16.65
LPMS-ivec 10.24 96.78 99.99 99.64 69.95
  1. EER (%) of the GMM i-vector systems under white box attack with different perturbation degrees (P).
P=0 P=0.3 P=1.0 P=5.0 P=10.0
MFCC-ivec 7.20 81.78 97.64 50.25 50.72
LPMS-ivec 10.24 94.04 99.95 99.77 88.60

Black box attacks

  1. EER (%) of the target systems under black box attack with different perturbation degrees (P).
P=0 P=0.3 P=1.0 P=5.0 P=10.0 P=20.0 P=30.0 P=50.0
LPMS-ivec attacks MFCC-ivec 7.20 8.83 13.82 50.02 69.04 74.62 74.59 63.24
MFCC-ivec attacks MFCC-xvec 6.62 8.52 14.06 57.43 74.32 60.85 54.07 51.34
LPMS-ivec attacks MFCC-xvec 6.62 7.42 9.49 25.47 37.51 43.89 48.48 48.39
  1. FAR (%) of the target systems under black box attack with different perturbation degrees (P).

Dependencies

  1. Python and packages

    This code was tested on Python 3.7.1 with PyTorch 1.0.1. Other packages can be installed by:

    pip install -r requirements.txt
  2. Kaldi-io-for-python

    kaldi-io-for-python is used for reading data of ark,scp format in kaldi via python codes. See README.md in kaldi-io-for-python for installation.

Prepare Dataset

  1. Download Voxceleb1 dataset

    To replicate the paper's work, get Voxceleb1 dataset at http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html. It consists of short clips of human speech, and there are in total 148,642 utterances for 1251 speakers. Consistent with Nagrani et al., 4874 utterances for 40 speakers are reserved for testing. The remaining utterances are used for training our SV models.

Train ASV models

Perform Adversarial Attacks

Citation

If the code is used in your research, please star our repo and cite our paper as follows:

@article{li2019adversarial,
  title={Adversarial attacks on GMM i-vector based speaker verification systems},
  author={Li, Xu and Zhong, Jinghua and Wu, Xixin and Yu, Jianwei and Liu, Xunying and Meng, Helen},
  journal={arXiv preprint arXiv:1911.03078},
  year={2019}
}

Contact