Diarization process with faster-whisper

I have a working application with real-time transcription feature based on faster-whisper. However, after applying diart pipeline to my existing application, I get transcription with no diarization. I expect the output of this audio to be as follows:

Expected output

Speaker 0:

Hi. It's Pat. Can I help you?

Speaker 1:

Well, not really.

Speaker 0:

Okay. And what Is this Brandy?

Speaker 1:

Just say there's somebody on the line that needs help?

Speaker 0:

No. Is this Brandy?

Speaker 1:

Yeah?

Speaker 0:

Yeah. Hi. It's Pat.

Actual output:

hi it's pat can i help you uh well not really okay just say there's somebody on the line that needs help no is this brandy yeah yeah hi it's pat

It looks like the diart is not working as expected with faster-whisper, resulting in the output not being properly labeled with speaker information.

Can anybody confirm if this is the case?

juanmc2005 / diart

Diarization process with faster-whisper #236

Expected output

Actual output: