m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
11.44k stars 1.2k forks source link

IndexError: index 34 is out of bounds for dimension 0 with size 34 #412

Open arttukataja opened 1 year ago

arttukataja commented 1 year ago

I get frequently the following breaking error when running WhisperX on longer Finnish language files:

Traceback (most recent call last):
  File "/home/arttu/miniconda3/envs/whisperx/bin/whisperx", line 8, in <module>
    sys.exit(cli())
  File "/home/arttu/miniconda3/envs/whisperx/lib/python3.10/site-packages/whisperx/transcribe.py", line 187, in cli
    result = align(result["segments"], align_model, align_metadata, input_audio, device, interpolate_method=interpolate_method, return_char_alignments=return_char_alignments)
  File "/home/arttu/miniconda3/envs/whisperx/lib/python3.10/site-packages/whisperx/alignment.py", line 216, in align
    trellis = get_trellis(emission, tokens, blank_id)
  File "/home/arttu/miniconda3/envs/whisperx/lib/python3.10/site-packages/whisperx/alignment.py", line 350, in get_trellis
    trellis[t, :-1] + emission[t, tokens],
IndexError: index 34 is out of bounds for dimension 0 with size 34

I was able to fix the error by adding following code blocks in alignment.py. I did the fix with using ChatGPT, and I don't really understand the context fully. I hope it is easy to understand the problem and do the fix correctly with the quick fix described here.

Code change 1, snippet starting at row 211 of alignment.py

        blank_id = 0
        for char, code in model_dictionary.items():
            if char == '[pad]' or char == '<pad>':
                blank_id = code

        trellis = get_trellis(emission, tokens, blank_id)
        # fix by Arttu 8.8.2023
        if trellis is None:
            print("trellis error, resorting to original")
            aligned_segments.append(aligned_seg)
            continue
        # /fix by Arttu 8.8.2023

        path = backtrack(trellis, emission, tokens, blank_id)

Code change 2, snippet starting at row 339 of alignment.py

def get_trellis(emission, tokens, blank_id=0):
    num_frame = emission.size(0)
    num_tokens = len(tokens)

    # fix by Arttu added 8.8.2023
    # Check if any token index is out of bounds
    max_token = max(tokens)
    if max_token >= emission.size(1):
        print("get_trellis error")
        return None
    # /fix by Arttu added 8.8.2023
jnissin commented 10 months ago

I am receiving the same error with about a 10 minute long Finnish audio file.

vertti commented 8 months ago

Stumbled to same error with 10:40 long Finnish audio file.

MelihDarcanxyz commented 2 months ago

Having the same error with Croatian audio of 40 minutes. Using classla/wav2vec2-xls-r-parlaspeech-hr-lm as align model.