nii-yamagishilab / PartialSpoof

BSD 3-Clause "New" or "Revised" License
27 stars 2 forks source link

Lin Zhang will defend her thesis on January 30th, 2024, from 16:00 to 17:00 (GMT+9). In her defense, she will summarize her work on PartialSpoof up to this point. Please feel free to join here register here if you are interested!

:tada: Welcome to the story of Partial Spoof (PS)!

This repository is an implementation of the papers related to Partial Spoof. It is adapted from project-NN-Pytorch-scripts. Below are some links that you might be interested in:

Please feel free to give suggestions and feedback. :paw_prints:

Lin Zhang; Xin Wang; Erica Cooper; Nicholas Evans; Junichi Yamagishi

Outline

  1. :boom: Updation
  2. :open_file_folder: Folder and Its Paper
  3. :evergreen_tree: Folder Structure
  4. :clipboard: Citation
  5. :handshake: Acknowledgments and License

:boom: Updation

  • 2023-12: add metrics: EER for spoof detection; SegmentEER and RangeEER for spoof localization.
  • 2023-12: add folders for multiple random seeds, and update readme.
  • 2022-12: release multi-reso. and single-reso CMs.

:open_file_folder: Folder and its paper

Folder Paper
00data-prepare Processing to generate PartialSpoof database and automatic annotation. (To be released)
01singletask CM trained on the single task (either utterance-level or segment-level detection) in the paper An Initial Investigation for Detecting Partially Spoofed Audio (To be released)
02multitask CM trained on multi tasks (both utterance-level and segment-level detection) in the paper Multi-task Learning in Utterance-level and Segmental-level Spoof Detection (To be released)
03multireso Multi resolution CM in the paper The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance
metric metric used for spoof (utterance-level) detection and (segment-level) localization Range-Based Equal Error Rate for Spoof Localization (To be released)

Please go to the [Folder]/README.md to read details of usages.

:evergreen_tree: Folder structure

PartialSpoof
├── 01_download_database.sh         : Script used to download PartialSpoof from zenodo.
├── 03multireso
│   ├── 01_download_pretrained_models.sh    : Script used to download pretrained models.
│   ├── main.py
│   ├── model.py            : Model structure and loss are in here! same for multi/single-reso.
│   ├── multi-reso      : folder for multi-reso model
│   ├── README.md
│   └── single-reso     : folder for single-reso model
│       └── {2, 4, 8, 16, 32, 64, utt}
├── config_ps               : Config files for experiments
│   ├── config_test_on_dev.py
│   └── config_test_on_eval.py
├── env.sh                      
├── Figures
│   ├── EERs.pdf
│   └── PartialSpoof_logo.png
├── LICENSE
├── metric          
│   ├── cal_EER.sh
│   ├── RangeEER.py
│   ├── README.md
│   ├── rttm_tool.py
│   ├── SegmentEER.py
│   └── UtteranceEER.py
├── database                    : PartialSpoof Databases
│   ├── train
│   ├── dev                     : Folder for dev set
│   │   ├── con_data    : related data file. (following kaldi format)
│   │   ├── con_wav     : waveform
│   │   └── dev.lst     : waveform list
│   ├── eval
│   ├── label2num           : convert string labels to numerical labels.
│   │   └── label2num_2cls_0sil     : bonafide/spoof (More to be released)
│   ├── protocols
│   ├── segment_labels
│   └── vad
│       ├── dev
│       ├── eval
│       └── train
├── modules
│   ├── gmlp.py
│   ├── LICENSE
│   ├── multi_scale
│   │   └── post.py
│   ├── s3prl                   : s3prl repo 
│   └── ssl_pretrain            : Folder to save downloaded pretrained ssl model
├── project-NN-Pytorch-scripts.202102   : Modified project-NN-Pytorch-scripts repo
└── README.md

:clipboard: Citation

It is appreciated if you can cite the corresponding paper when the idea, code, and pretrained model are helpful to your research.

@inproceedings{zhang21ca_interspeech,
  author={Lin Zhang and Xin Wang and Erica Cooper and Junichi Yamagishi and Jose Patino and Nicholas Evans},
  title={{An Initial Investigation for Detecting Partially Spoofed Audio}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={4264--4268},
  doi={10.21437/Interspeech.2021-738}
}
@inproceedings{zhang21_asvspoof,
  author={Lin Zhang and Xin Wang and Erica Cooper and Junichi Yamagishi},
  title={{Multi-task Learning in Utterance-level and Segmental-level Spoof Detection}},
  year=2021,
  booktitle={Proc. 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge},
  pages={9--15},
  doi={10.21437/ASVSPOOF.2021-2}
}
@article{10003971,
  author={Zhang, Lin and Wang, Xin and Cooper, Erica and Evans, Nicholas and Yamagishi, Junichi},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance}, 
  year={2023},
  volume={31},
  number={},
  pages={813-825},
  doi={10.1109/TASLP.2022.3233236}}

:handshake: Acknowledgments

This study is partially supported by the Japanese-French joint national VoicePersonae project supported by JST CREST (JPMJCR18A6, JPMJCR20D3), JPMJFS2136 and the ANR (ANR-18-JSTS-0001), MEXT KAKENHI Grants (21K17775, 21H04906, 21K11951, 18H04112), Japan, and Google AI for Japan program.

:clipboard:License

This project is mainly licensed under the BSD 3-Clause License (./LICENSE). Each folder within the project may contain their corresponding LICENSE according to the external libraries used. Please refer to the README.md file in each folder for more details.

Additionally, specific licenses for some of the external libraries used are mentioned below:

  • modules/s3prl is licensed under the MIT License (modules/s3prl/LICENSE.txt), but please note that the latest version of s3prl is now under the Apache License version 2.0.
  • project-NN-Pytorch-scripts.202102 is licensed under the BSD 3-Clause License (project-NN-Pytorch-scripts.202102/LICENSE).
  • modules/gmlp.py is licensed under the MIT License (modules/LICENSE)