Minimal amount of data for one speaker

Hi @AigizK,

Thanks for your question,

During training, you don't necessarily need data on a particular speaker to be able to recognise them in inference.

What you need instead is a dataset with enough variety (many different speakers) in a given language for your model to work well with speakers speaking that language. You can find available datasets (in Japanese, French, Deutch, Chinese, English and Spanish) ready to be used with diarizers here.

Of course, if you have data for a specific speaker, you can add it to this training data and it will certainly improve the recognition rate of your model for that specific speaker.

Hope it will help you!

huggingface / diarizers

Minimal amount of data for one speaker #8