Open ai-nikolai opened 2 months ago
@hbredin
Thank you for responding @hbredin. I will try and add a minimal reproducible script in coming days. However, in the mean time I have a quick question.
pyannote/speaker-diarization-3.1
?as far as I know, pyannote will convert any audio into mono channel 16khz. In my experience, generally audio files recorded at a higher sample rate (44khz) will always perform well just because they have more information even after downsampling to 16khz. While a file recorded at 16khz has less information.
Thank you, qalabeabbas49. I guess what I find interesting is that the audio file is the same. I.e. originally 44.1K or originally 16K. And then: The original file gets loaded in either 44.1K or 16K and then pyannote converts to 16K (as you said). Loading this file in 44.1K makes a difference - not whether the file was originally 44.1K.
(loading via ffmpeg -ar 16000
; or ffmpeg -ar 44100
)
same to me, 44.1K performing better
Tested versions
Reproducible in 3.1, 3.3
System information
M2 Pro, 3.3
Issue description
Sample Rate Mis-match:
pyannote/speaker-diarization-3.1
Question:
@hbredin
Minimal reproduction example (MRE)
N/A