Closed tranctan closed 4 years ago
I just figured out by chance that if we load the audio into numpy array (by librosa or scipy) in prior to feeding into preprocess_wav()
function in resemblyzer.audio
module, we need to make sure that we resample the data to 16,000Hz, or we can just feed the whole audio wav path to the preprocess_wav()
instead.
This is trivial but really hard to find the mistake.
Hi, when I tried visualizing the voices, it is shown that there is one sample (female voice) that is actually far away from the male speaker's utterances (which is expected).
However, when I compute the cosine similarity between the female's utterance versus the male ones, the value is quite high (0.88). I don't know if I perform the cosine similarity correctly here.
Any help is very much appreciated !