resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning
Apache License 2.0
2.66k stars 419 forks source link

After diarization, The timestamps I got are irrelevants from original file #70

Open teoh79 opened 2 years ago

teoh79 commented 2 years ago

Hello everybody , first thanks to this community to support the developers.

I tried the resemblyzer diarization and I got irrelevants results on the timestamps for each speaker compare to original files:

For example : 1/ the last timestamps doesn't corresponds to the end time of the wav file even if we speak into the end

2/ is the removing of silence provoque a shift of every timestamps compare to original wav file?

3/ does the original wav is trim out during VAD process or any other one? (Segmentation or clustering...)

Thanks in advance!

theashishbhatt commented 2 years ago

@teoh79 I have noticed the same issue. The audio length in the output is shorter than the actual audio length.

ConnieZi commented 2 years ago

That is probably because the silences of the input audio will be trimmed when preprocess_wav is used. There are similar problems #45 and #63. I am considering trimming the silences in the original audio as well before preprocessing so that it can match the resemblyzer output, which is also mentioned solved in #63, saying that wav is actually the trimmed audio Other than that, hope there are any other solutions.

Nirannoel commented 1 year ago

How to extract the timestamps