Closed Quidam2k closed 1 year ago
Interesting, I've never seen ">>" in transcriptions. The whisper AI is trained with many hours of transcribed audio data. For the most part, these are videos with subtitles. So whisper sometimes produces weird artifacts that might have been present in some of it's training data - like this ">>" notation. Anyhow: I have just released a new version of noScribe (0.3) that also tries to improve speaker separation, especially in cases like yours with quick changes. It's not perfect, but you might give it a try. The basic principle is to look at smaller chunks of audio to get a more fine grained speaker separation. You can reduce the chunk size even further by changing the "max-len" value in the advanced options (look at the readme on how to change these options).
I've been working with it more, and am seeing "-" being used in a similar fashion. Where Whisper clearly caught the change in speaker but the speaker seperator didn't until the next chunk began. I'll try further reducing the "max-len" value, but I'm also wondering if it might not be possible to use those characters if they're there to help the speaker ID code do a better job. Seems a shame to waste that data.
Interesting observations. I think though that this is not consistent enough to be used in the program, but we will see. The speaker separation needs more work, I agree with that. The problem is not so much pyannote (the "speaker seperator") but the fact that whisper produces not very precise timestamps. This makes the synchronisation between pyannote and whisper difficult, especially if speakers change quickly. If you are interested in the raw output of pyannote, see the log file for your transcript (last bullet point in Advanced options).
Yeah, I kind of figured it wouldn't be a simple matter. I'll keep playing with the max_length value and see if I can bring it closer to true.
It might be worthwhile instead writing a script that will go through a noScribe generated text file looking for a supplied delimiter character and move the indicated text to the next linebreak after the colon. If I give it a go and have luck I'll let you know.
looking for a supplied delimiter character and move the indicated text to the next linebreak
You might be able to achieve this with a clever use of search & replace in Word (in multiple steps, you can also turn this into a macro).
I have just released vers. 0.4b with a much improved speaker seperation. Give it a try. If you still have problems, open a new issue please.
I've noticed that sometimes the ability of the software to distinguish different voices seems a bit inconsistent. I'll notice ">>" in the text and can tell they are meant to indicate a change in the speaker, but they're appearing in the middle of one speaker's chunk of text. Here's an example from the D&D game I was transcribing: