pyannote / pyannote-database

Reproducible experimental protocols for multimedia (audio, video, text) database
MIT License
83 stars 28 forks source link

feat: add support for custom SpeakerVerification protocols #50

Closed vbrignatz closed 4 years ago

vbrignatz commented 4 years ago

I modified the custom.py for it to support the creation of speaker verification protocols on the fly.

I added two keys in the configuration file database.yml :

I tested the validation and the training of a speaker embedding model with Voxceleb2 as the custom dataset I named myVoxCeleb and it worked.

hbredin commented 4 years ago

Thanks @vbrignatz - this is a very nice addition to the package 🎉

I have been working in parallel on refactoring pyannote.database.custom (and the introduction of custom data loaders) to make it much more flexible. Work is in progress in branch custom (or pull request #51).

This will undoubtedly conflict with the changes you propose in this pull request. Therefore, I will come back to this pull request when #51 has been merged.

Feel free to give your opinion on #51 as well (in particular how it could be improved to make speaker identification custom protocols easier to define).

hbredin commented 4 years ago

Hi @vbrignatz, would you mind updating your PR to work on custom branch?

I have added a bunch of things (and a proper documentation) that should make the integration of new custom tasks (such as speaker verification) easier.

If you do not want or have time to do it, please let me know and I will merge custom branch as it is. Otherwise, it can wait for your updated PR.

vbrignatz commented 4 years ago

Hi @hbredin, I will work on this tomorow. FYI, my thinking is that I should create :

and that I should modify :

to support the custom SpkVerif protocols.

hbredin commented 4 years ago

Hi @hbredin, I will work on this tomorow.

Great. Thanks!

the DURLoader class to load the durations files

It could be something slightly more generic that expects a text file (e.g. with .map suffix) with the following uri value format:

filename1 value1
filename2 value2
filename3 value3

I already think of two use cases for this kind of data loaders:

The only issue I foresee is how to make sure that durations are returned as float and domains as str. I think pandas.read_csv (pandas is already a requirement for pyannote.database anyway) is smart enough to do the conversion itself but maybe there is another way...

the subset_trial_iter function that will create the trial fuction needed in SpkVerif protocols

Yes. Will you always assume that try_with contains the whole file?
Or do you have any idea how we could support trials with file excerpts? It is OK if your answer to the second question is "no": I'll live with that :-)

and that I should modify : add_custom_protocols and create_protocol

Yes!

And an update to the README for completeness ;-)

Thanks again. Looking forward to it!