resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.79k stars 429 forks source link

Pre-trained model accuracy #16

Closed bharat-patidar closed 4 years ago

bharat-patidar commented 4 years ago

Hi Corentin, Thanks for providing this amazing repo. I was just going through the speaker diarization script and tried to create embedding of my voice. I tested it on multiple audios recorded from different devices and got decent results. Would you suggest retraining/fine-tuning this model for better accuracy? Or I should experiment on other available pretrained models for better results?

CorentinJ commented 4 years ago

The way the model is trained is through a speaker verification task that happens to transfer well to TTS. The training involves thus many speakers and the more speakers the higher the quality of the embeddings, hence finetuning on a single speaker makes little sense.

Finetuning tacotron on your single speaker embedding does make sense, but it's going to be hard with my other repo the way it is.