Open teoh79 opened 2 years ago
@teoh79 I have noticed the same issue. The audio length in the output is shorter than the actual audio length.
That is probably because the silences of the input audio will be trimmed when preprocess_wav
is used. There are similar problems #45 and #63. I am considering trimming the silences in the original audio as well before preprocessing so that it can match the resemblyzer output, which is also mentioned solved in #63, saying that wav
is actually the trimmed audio Other than that, hope there are any other solutions.
How to extract the timestamps
Hello everybody , first thanks to this community to support the developers.
I tried the resemblyzer diarization and I got irrelevants results on the timestamps for each speaker compare to original files:
For example : 1/ the last timestamps doesn't corresponds to the end time of the wav file even if we speak into the end
2/ is the removing of silence provoque a shift of every timestamps compare to original wav file?
3/ does the original wav is trim out during VAD process or any other one? (Segmentation or clustering...)
Thanks in advance!