linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

error for long (1 hr) hindi video - used large-v2 whisper model #87

Closed rairavi closed 1 year ago

rairavi commented 1 year ago

97%|█████████▋| 157296/162500 [40:30<01:17, 67.37frames/s] 99%|█████████▊| 160296/162500 [40:55<00:32, 67.39frames/s] 99%|█████████▊| 160296/162500 [41:10<00:32, 67.39frames/s] 100%|██████████| 162500/162500 [41:40<00:00, 61.43frames/s] 100%|██████████| 162500/162500 [41:40<00:00, 64.98frames/s] Got inconsistent length for segment 48 (49 != 19). Some words have been ignored. Traceback (most recent call last): File "/data/p/code.py", line 84, in result = transcribe(video_converted,language) File "/data/p/codeTranscript.py", line 15, in transcribe return transcribe_timestamped(audio,language) File "/data/p/codeTranscript.py", line 33, in transcribe_timestamped result = whisper_timestamped.transcribe(model, audio, language, fp16=False, verbose=False) File "/data/p/venv-3.10/lib/python3.10/site-packages/whisper_timestamped/transcribe.py", line 264, in transcribe_timestamped transcription, words = remove_last_null_duration_words(transcription, words, recompute_text=True) File "/data/p/venv-3.10/lib/python3.10/site-packages/whisper_timestamped/transcribe.py", line 1889, in remove_last_null_duration_words raise RuntimeError(f"\"{text}\" not ending with \"{full_word}\"") RuntimeError: " पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए पर्टिएजार के लिए �" not ending with " लिए"

Jeronymous commented 1 year ago

Thanks for reporting. I would need a way to reproduce this in order to investigate... (like having the audio file, and the exact set of options that are used, also knowing whether it runs on GPU or CPU device)

Can you at least give the whisper version you use?

whisper_timestamped --versions
rairavi commented 1 year ago

whisper_timestamped --versions 1.12.17 -- Whisper 20230314 in /Users/python/venv-3.10/lib/python3.10/site-packages/whisper

CPU - Macbook M2 on large-ve, only timestamp=true option rest is defualt

https://www.youtube.com/watch?v=rn64Vf6GEoo

On 02-May-2023, at 10:13 PM, Jérôme Louradour @.***> wrote:

Thanks for reporting. I would need a way to reproduce this in order to investigate... (like having the audio file, and the exact set of options that are used, also knowing whether it runs on GPU or CPU device)

Can you at least give the whisper version you use?

whisper_timestamped --versions — Reply to this email directly, view it on GitHub https://github.com/linto-ai/whisper-timestamped/issues/87#issuecomment-1531804228, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7BCLHV6HVDNDEP4F4DVXJTXEE2UXANCNFSM6AAAAAAXOY4IFA. You are receiving this because you authored the thread.

Jeronymous commented 1 year ago

Thank you @rairavi

I've just disabled a dangerous heuristic that was causing this issue, and also possible removing relevant words from the transcriptions. (it consisted into removing words with empty duration, assuming it was coming from Whisper's hallucinations).

So in a sense that issue is solved