resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.66k stars 419 forks source link

Getting different speaker embeddings from same wav file on different machine #81

Closed predawnang closed 1 year ago

predawnang commented 1 year ago

Hi, im planing to use resemblyzer as metric to evaluate voice conversion model, but i find that the speaker embedding genereted from resemblyzer gives different result from one wav file on different machine.

In one machine it gives me image

and in the other it gives me image

Both machines are ubuntu 18.04.6, the two given embeddings are sightly different at the decimal places. resamblyzer in both machine is version 0.1.1dev0 installed through pip.

I want to use speaker verification as metric to test my model, but since it gives different result, it's hard to reproduce my result on different machines. Is there a way to make the embedding consistent on different machines?

Thanks

CorentinJ commented 1 year ago

You can expect variations in float precision for torch operations depending on the environment they are running on. These variations are usually no higher than 1e-06. As far as I know, there is no foolproof way to solve this problem without disabling pytorch optimizations.

You can safely round the embedding down to 5 decimal places, or assume that you will have small floating point errors in your computations and adapt your code in this regard. Numpy and torch have an all_close method that might be helpful.

predawnang commented 1 year ago

Thank you for your reply, it's helpful.