Closed mirix closed 1 year ago
Also, in some cases, utterances from different speakers are glued into the same segment. But I need to test several files to see if the prevalence of this issue is higher or lower than it is with other approaches.
I am testing whisper-timestamped but the output is neither punctuated nor capitalised.
There is no particular reason why this should happen. Whisper is a statistical model, and Can you try with another model sizes? Also with vad=False?
If it persits, can you share the audio and tell which model size you use?
utterances from different speakers are glued into the same segment.
Yes, also because it's a statistical model, trained mostly on Youtube subtitles, where the segmentation into subtitles depends on the sentences lengths, and not necessarily on speaker turns... Whisper was not trained to do speaker diarization. I guess the speaker turns are quite short for you to observe that?
Here also, using vad=False might help.
The issue has vanished.
Hi,
I am testing whisper-timestamped but the output is neither punctuated nor capitalised.
Here is my code:
Is there a specific option or do I need to use json.dump or something?
Best,
Ed