resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.67k stars 419 forks source link

Clarification on embeddings training #40

Open davide-scalzo opened 3 years ago

davide-scalzo commented 3 years ago

Hi @CorentinJ! Great repo! I have one question in regards to the embeddings training. Are they trained using cosine similarity, euclidian distance or some other loss?

I'm trying to use this repo in conjunction with https://github.com/wq2012/SpectralCluster but the results don't make a lot of sense , check this out https://github.com/wq2012/SpectralCluster/issues/6 and seems like it might be due to some incompatibility between the two libs if embeddings are not trained on euclidian distance. If so, is there a suggested library for a clustering algorithm where the number of speakers is not known in advance?

CorentinJ commented 3 years ago

It is cosine similarity. As for the issue with kmeans, I found this thread.

I don't know of a good way of determining speakers. I have a vague idea which I detailed in #10. One of our engineers, @adityatb, has been looking into it recently, he might be of more help

davide-scalzo commented 3 years ago

Thanks @CorentinJ I'll look into it!

adityatb commented 3 years ago

Hey @davodesign84. I had tried to cluster with an unknown number of speakers. With a little tweaking I got some decent results with HDBSCAN. I had also looked into x-means, and uMap which were also interesting, and might be useful.

davide-scalzo commented 3 years ago

Hi @adityatb that's exactly what I tried, but with little success. Do you happen to have some indication of what parameters did you use?