jianfch / stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper
MIT License
1.59k stars 176 forks source link

IndexError: index 0 is out of bounds for axis 0 with size 0 #321

Closed xinzuan closed 6 months ago

xinzuan commented 8 months ago

Hi, I encountered an issue when the video duration is shorter than the text duration. Specifically, an error occurs as follows

stable_whisper/alignment.py", line 547, in align
    [(np.flatnonzero(word_lens >= i)[0] + 1) for i in split_indices_by_char]
                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: index 0 is out of bounds for axis 0 with size 0

To address this, I made a temporary modification to the code on line 547::

[(np.flatnonzero(word_lens >= i)[0] + 1) for i in split_indices_by_char if np.flatnonzero(word_lens >= i).size > 0]

This change has resolved the issue for now (at least in my case). However, I am wondering if you have a more permanent or elegant solution.

jianfch commented 8 months ago

That line is from an older version. It should be line 543 on the latest commit. This might be already fixed on latest version. So try to update stable-ts and see if you can replicate the issue.

If the issue persists, there is bug with values in split_indices_by_char. So in either case, the fix shouldn't be made on that line.

xinzuan commented 8 months ago

Hi, thank you for the fast response and suggestion. I have already upgraded the model to the latest commit but the issue still persists.

I will reinvestigate to find the cause