Open CuirPork opened 2 months ago
interesting. I have seen this kind of behavior if my audio file is not mono, or wrongly converted into mono or very noisy.
+1, same behavior, turned off the time stamp option and transcribes perfectly. Using Windows 11 with Nvidia card
Same behaviour (Windows 11, AMD), solved by turning off word-level timestamps
Thank you all for writing!
The word-level timestamps wasn't enabled by default right? you enabled them manually?
@CuirPork
Does it fixed the issue if you disable it?
Can you share link to the video / audio? You can share YouTube link or upload to Google Drive and share the link here.
Turning it off fixed it. Don't remember if it was enabled by default.
It seems to be a combination of Diarisation and Word level timestamps.
When I disable word level timestamps and enable diarisation - I get output with diarisation When I enable word level timestamps and disable diarisation - I get output with word level timestamps When I enable them both I get blank utterances with the speakers listed for each blank utterance.
I'm using m1 Apple chip.
Can also confirm this bug appears even on small files
I downloaded the Vibe software and then it downloaded the Open AI model. While it was downloading, I looked at the options and realized that there was an option to identify speakers, so I clicked it. Then it appeared to launch a new window with the message that it needed to download the extended library to the Open AI model. So I left it alone.
Once it was done installing locally, I added a local file that was bodycam footage of a police officer interviewing a motorist and a bicyclist who had been involved in a collision.
It took quite a while before I finally saw "SPEAKER 1:" but no timestamp or text. A little bit more time passes and "SPEAKER 2:" appears, no timestamp or text. Flash forward about 2 hours and Vibe claims that it's done transcribing the 20-minute video. However, the only thing in the text file when I saved was the SPEAKER 1: to SPEAKER 2: indications. No text, no timestamp. Just the speaker separations.
I posted to Reddit and was asked to report that here. Hope this helps, lemme know if I can answer any questions. Thanks.