linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
1.87k stars 149 forks source link

assert l1 == l2 or l1 == 0, f"Inconsistent number of segments: whisper_segments ({l1}) != timestamped_word_segments ({l2})" #205

Open hinswhale opened 1 month ago

hinswhale commented 1 month ago

I met this problem several times,what can I do to fix it? Thanks Perhaps we should implement a feature to temporarily save transcribed files, allowing us to double-check the results and ensure that previous work isn't lost


WARNING:whisper_timestamped:Inconsistent number of segments: whisper_segments (339) != timestamped_word_segments (340)
Traceback (most recent call last):
  File "/usr/local/bin/whisper_timestamped", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/whisper_timestamped/transcribe.py", line 3097, in cli
    result = transcribe_timestamped(
  File "/usr/local/lib/python3.10/dist-packages/whisper_timestamped/transcribe.py", line 296, in transcribe_timestamped
    (transcription, words) = _transcribe_timestamped_efficient(model, audio,
  File "/usr/local/lib/python3.10/dist-packages/whisper_timestamped/transcribe.py", line 920, in _transcribe_timestamped_efficient
    assert l1 == l2 or l1 == 0, f"Inconsistent number of segments: whisper_segments ({l1}) != timestamped_word_segments ({l2})"
AssertionError: Inconsistent number of segments: whisper_segments (339) != timestamped_word_segments (340)
KillerX commented 2 weeks ago

I just started seeing this. Did you per chance recently start using a different whisper model?

Jeronymous commented 2 weeks ago

There is an opened discussion on this : https://github.com/linto-ai/whisper-timestamped/discussions/79#discussioncomment-10405887

It seems to be a corner case, that happens when the Whisper model predicts a transcript which only involves special language tokens up to the maximum token length (e.g. <|0.00|><|de|><|de|><|de|><|de|><|de|>...).

I am just waiting to have a quick way to reproduce this corner case, to be able to fix it safely.