Closed hmehdi515 closed 4 months ago
As has been written previously by the author, play around with the parameters tau_active=0.5, rho_update=0.1, delta_new=0.57. You can find more information about these parameters in the readme and in the issues:
"Increase the value of delta_new, which is essentially the distance threshold (between speaker embedding and closest cluster centroid) to detect new speakers." https://github.com/juanmc2005/diart/discussions/171
Find more about parameters here: https://diart.readthedocs.io/en/latest/autoapi/diart/blocks/clustering/index.html https://diart.readthedocs.io/en/latest/autoapi/diart/blocks/diarization/index.html
Tldr: Please try and lower the tau_activate threshold.
Update -- decreasing the sample rate of the audio file gave more accurate transcriptions:
So, what's new, Mark? How is your new job going?
To be honest...
I can't complain.
I really can't complain.
I really love the company.
That I am working for.
My coworker...
...are all really...
friendly and helpful.
They really help...
me feel welcome.
I've decided 16000hz is the most accurate, now I'll play with the parameters.
Hi @hmehdi515, the sample rate supported by both whisper and diart is 16kHz, make sure all your audio is loaded that way or dynamically resampled by diart (check out diart.utils.Resample
).
I'm closing the issue as it looks resolved.
Hi. I trying to test this program with an audio file and can't seem to get rid of hallucinations, although the transcribing works perfectly by just using
The correct transcription that is outputted (.srt file):
But when I run the diart_whisper program (same small model):
I am using a server with GPUs and Cuda so I don't believe using a slow CPU is the issue.
I've tried playing around with the chunk size with no success. Not sure if tuning the parameters is the issue either since it works from the command line. Might be something going wrong in the diarization part?
I have attached a .mp4 of the audio file used since I can't attach a .mp3 or .wav file, but you can change the extension to .wav
https://github.com/juanmc2005/diart/assets/170109983/78e41776-2460-42f9-97f1-08a805be2ec6
The code:
Any help is appreciated. Thanks