pyannote / pyannote-db-voxceleb

VoxCeleb plugin for pyannote.database
http://github.com/pyannote/pyannote-database
Other
29 stars 9 forks source link

VoxCeleb plugin for pyannote.database

This package provides an implementation of the speaker verification protocols used in the VoxCeleb paper.

Citation

Please cite the following reference if your research relies on the VoxCeleb dataset:

@InProceedings{VoxCeleb,
  author = {Nagrani, A. and Chung, J.~S. and Zisserman, A.},
  title = {{VoxCeleb: a large-scale speaker identification dataset}},
  booktitle = {{Interspeech 2017, 18th Annual Conference of the International Speech Communication Association}},
  year = {2017},
  month = {August},
  address = {Stockholm, Sweden},
  url = {http://www.robots.ox.ac.uk/~vgg/data/voxceleb/},
}

Please cite the following references if your research relies on this package. This is where the whole pyannote.database framework was first introduced:

@inproceedings{pyannote.metrics,
  author = {Herv\'e Bredin},
  title = {{pyannote.metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems}},
  booktitle = {{Interspeech 2017, 18th Annual Conference of the International Speech Communication Association}},
  year = {2017},
  month = {August},
  address = {Stockholm, Sweden},
  url = {http://pyannote.github.io/pyannote-metrics},
}

Installation

$ pip install pyannote.db.voxceleb
/path/to/voxceleb/voxceleb1/dev/wav/id10001/1zcIwhmdeo4/00001.wav
/path/to/voxceleb/voxceleb1/test/wav/id10270/5r0dWxy17C8/00001.wav
/path/to/voxceleb/voxceleb2/dev/aac/id00012/21Uxsk56VDQ/00001.m4a
/path/to/voxceleb/voxceleb2/test/aac/id00017/01dfn2spqyE/00001.m4a
Databases:

  VoxCeleb:
    - /path/to/voxceleb/voxceleb1/dev/wav/{uri}.wav
    - /path/to/voxceleb/voxceleb1/test/wav/{uri}.wav
    - /path/to/voxceleb/voxceleb2/dev/aac/{uri}.wav
    - /path/to/voxceleb/voxceleb2/test/aac/{uri}.wav

Note that m4a files from VoxCeleb 2 have to be converted to wav for pyannote.audio to handle them nicely.

Usage

We start by initializing VoxCeleb2 speaker verification protocol.

from pyannote.database import get_protocol
protocol = get_protocol('VoxCeleb.SpeakerVerification.VoxCeleb2')

One can use protocol.train generator to train the background model.

for training_file in protocol.train():
    uri = training_file['uri']
    print('Current filename is {0}.'.format(uri))

    # "who speaks when" as a pyannote.core.Annotation instance
    annotation = training_file['annotation']
    for segment, _, speaker in annotation.itertracks(yield_label=True):
        print('{0} speaks between t={1:.1f}s and t={2:.1f}s.'.format(
            speaker, segment.start, segment.end))

    break  # this should obviously be replaced
           # by the actual background training

Use protocol.test_trial to iterate over all test trials:

for t, trial in enumerate(protocol.test_trial()):

    file1 = trial['file1']    
    file2 = trial['file2']    
    same = trial['reference']

    msg = f"{file1['uri']} vs. {file2['uri']} {'same' if same else 'different'}"
    print(msg)

    if t > 10:
        break    

Here is how to get the list of all available speaker verification protocols.

from pyannote.database import get_database
voxceleb = get_database('VoxCeleb')

for protocol_name in voxceleb.get_protocols('SpeakerVerification'):
    print(f'VoxCeleb.SpeakerVerification.{protocol_name}')