Open noahuser opened 7 months ago
This seems to be a duplicate of https://github.com/linto-ai/whisper-timestamped/issues/94 You have some suggestion there.
Repetitions are due to model hallucination. Which model are you using?
I am using the large model. I already tried everything in #94. I have two MacBooks, one Intel i7 the other one M2 Pro. I tried the same audio with the Intel one, it functions perfectly w/out any Issue. The one with M2 Pro does this hallucinations every time. In the M2 one I "solved" the problem putting this code on my Python (result = transcribe_timestamped(model, audio_file, beam_size=5, best_of=5, temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0))). But usually it takes 1 hour to transcribe the audio, with this kind of setting it actually functions very well, but it takes something like 6 hours pro audio. That's not a big problem, but when It functioned before in an hour it was really beautiful :)
OK, when you say "I use the large model", you have to know there are several versions of the large model (now there are 3).
So if you use model = whisper.load_model("large")
in your code, without specifying the version, that might load the latest version.
Which could explain why you experienced a difference of behaviour suddenly.
You can specify the exact version to use, with, e.g. model = whisper.load_model("large-v1")
(or "large-v2"
whatever was the last one when "it worked flawlessly")
I've been using Whisper-timestamped for some time and it worked flawlessly. However, after a few months during which I updated my Mac to Sonoma, I've encountered a recurring issue upon returning to use the tool. The transcription process appears to proceed normally, with the loading bar reaching 100% as expected. Yet, at a certain point, the transcription process gets stuck and begins looping the same sentence over and over again until the end of the audio file. For instance, at 00:38:03, it transcribes a sentence and then repeats this sentence in a loop until 01:30:03, which is when the audio ends. Initially, I suspected an issue with the audio file itself, but the problem persists across different audio files, including one that was previously transcribed perfectly a few months ago. Interestingly, the exact timing of when the loop starts varies with each attempt. I am at a loss on how to resolve this issue. Does anyone have any suggestions or insights?
I have already tried to enable VAD, nothing changed. I already tried to uninstall and reinstall whisper-timestamped, nothing changed.