resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.66k stars 419 forks source link

changing <partials_n_frames> to reduce partial utterances length and increase resolution (diarization with spectral clustering) #50

Open dcanones opened 3 years ago

dcanones commented 3 years ago

Hi!

I am trying to implement the paper: https://arxiv.org/pdf/1710.10468.pdf to create an unsupervised diarization algorithm using the d-vectors provided by the pre-trained model in Resemblyzer.

I found that the length of the partial utterances (1.6s), determined by the hyperparameter partials_n_frames with a default value 160 may be too high. In the paper, the authors recommend a window size and step of 240ms and 120ms for this kind of diarization, respectively.

Is this parameter something that can be changed easily? As it is implemented as a setting in the source (hyperparams.py) code and not as an argument of a function or method it looks like it is not a good idea to modify it.

Thanks in advance.

David.

sourav1122 commented 3 years ago

did u change mel window length to 240 and mel window step to 120??

hbq-ruc commented 2 years ago

I have the same question as I had resolution issues while implementing the same paper. I'm a little confused why partials_n_fames isn't a changeable parameter. Have you tried changing it?

kafan1986 commented 2 years ago

Who said the partials_n_fames can not be changed? If my partial utterance duration is 400ms (default is 1.6 seconds), I would make the rate to be 2.5 and then change the value of partials_n_frames = 40, so that mel_window_step * partials_n_frames == partial utterance duration