linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
1.87k stars 150 forks source link

[Bug] remove_last_null_duration_words #62

Closed mmichelli closed 1 year ago

mmichelli commented 1 year ago

I got this error:

Using cache found in /home/mario/.cache/torch/hub/snakers4_silero-vad_master 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 305493/305493 [14:31<00:00, 350.46frames/s] An additional token was added on segment 33 An additional token was added on segment 39 An additional token was added on segment 43 An additional token was added on segment 44 An additional token was added on segment 45 An additional token was added on segment 60 An additional token was added on segment 65 An additional token was added on segment 91 An additional token was added on segment 92 Traceback (most recent call last): File "/home/mario/code/video_transcriptions/app.py", line 14, in <module> result = whisper.transcribe(model, audio, language="no", vad=True) File "/home/mario/mambaforge/envs/video_transcriptions/lib/python3.9/site-packages/whisper_timestamped-1.12.1-py3.9.egg/whisper_timestamped/transcribe.py", line 264, in transcribe_timestamped transcription, words = remove_last_null_duration_words(transcription, words, recompute_text=True) File "/home/mario/mambaforge/envs/video_transcriptions/lib/python3.9/site-packages/whisper_timestamped-1.12.1-py3.9.egg/whisper_timestamped/transcribe.py", line 1822, in remove_last_null_duration_words assert text.endswith(full_word)

Jeronymous commented 1 year ago

Thanks for reporting.

Can you update to the last version and retry? Maybe it's fixed, or it should give a bit more information pip install --upgrade --no-deps --force-reinstall git+https://github.com/linto-ai/whisper-timestamped

If it persists, is there any chance you can share the audio on which it is failing? (and give the detail of which model is used)

mmichelli commented 1 year ago

Updated, but it still has the same issue.

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 305493/305493 [14:58<00:00, 339.81frames/s] An additional token was added on segment 33 An additional token was added on segment 39 An additional token was added on segment 43 An additional token was added on segment 44 An additional token was added on segment 45 An additional token was added on segment 60 An additional token was added on segment 65 An additional token was added on segment 91 An additional token was added on segment 92 Traceback (most recent call last): File "/home/mario/code/video_transcriptions/app.py", line 14, in <module> result = whisper.transcribe(model, audio, language="no", vad=True) File "/home/mario/mambaforge/envs/video_transcriptions/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 264, in transcribe_timestamped transcription, words = remove_last_null_duration_words(transcription, words, recompute_text=True) File "/home/mario/mambaforge/envs/video_transcriptions/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 1836, in remove_last_null_duration_words assert text.endswith(full_word), f"\"{text}\" not ending with \"{full_word}\"" AssertionError: " Vi er der vi er. Hvis kommunestyret i dag vedtar at tuen var inhabil, må saken behandles på nytt. Det kunne være fristende for oss i Hålista, siden vi hele tiden har ønsket et annet vedtak enn det som ble vedtatt i kommunestyret. Men vi vil ikke blande kort. Denne saken handler om habilitetsspørsmålet, ikke om en omkamp i en reguleringssak. At vi i denne situasjonen skulle velge å overprøve kommunestyrets vurdering, blir en trygghet." not ending with " trygghet.."

mmichelli commented 1 year ago

I can't upload the files. They are both too large.

Jeronymous commented 1 year ago

Thanks for posting the new output. I will think about a solution

mmichelli commented 1 year ago

`File whisper_timestamped/transcribe.py:1836, in remove_last_null_duration_words(transcription, words, recompute_text) 1834 segment = transcription["segments"][idx_segment] 1835 text = segment["text"] -> 1836 assert text.endswith(full_word), f"\"{text}\" not ending with \"{full_word}\"" 1837 text = text[:-len(full_word)] 1838 if text:

AssertionError: " Hvor skal brevet havne? På et postmottak? Et eller annet terminal? Mens en finner ut hvem som er mottakeren, eller hvem som er avsenderen, slik at en kan returnere brevet tilbake til sitt opphav. Hvor lenge tid tar en slik prosess? Hvor lenge skal brevet være på et postmottak? Et terminal. Hvor lenge skal brevet slenges fra hylle til hylle? Det er støver. Det er fuktig. Det er gjennomtrekk. Det er mange år." not ending with " år.."`

Jeronymous commented 1 year ago

Thanks again @mmichelli This should be fixed now with version 1.12.4

mmichelli commented 1 year ago

Thanks :)