Speaker-Embeddings

Implementation of Generalized End-to-end Loss for Speaker Verification - GE2E loss, which yields speaker embeddings as results.

This is a mere small project to practice re-producing paper, if you want a repository that actually re-produce the paper, please refer to Resemblyzer or the encoder module of Voice Cloning.

Posts of Reproducing ML papers

https://towardsdatascience.com/converting-deep-learning-research-papers-to-code-f-f38bbd87352f
https://medium.com/@derekchia/common-problems-when-reproducing-a-machine-learning-paper-17178515d6c6

How the author of Resemblyzer implements GE2E loss

Real-time Voice Cloning thesis - Section 3.3

MultiReader technique

The authors introduced the MultiReader technique to combine different data sources, enabling to train with multiple keywords (TD-SV) and multiple languages (TI-SV and TD-SV) and helps solving the limited training data problem.

Dataset

VCTK is a large and sufficient multi-speaker dataset
Mozilla Common Voice is a smaller multi-speaker dataset crowdsourced (Can be sufficient for prototyping)
VIVOS is good multi-speaker VNese voice dataset

tranctan / Speaker-Embeddings

readme

Speaker-Embeddings

Posts of Reproducing ML papers

How the author of Resemblyzer implements GE2E loss

MultiReader technique

Dataset