linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

"Got infinite logprob" assertion failure, with option condition_on_previous_text=False #9

Closed ItakeLs closed 1 year ago

ItakeLs commented 1 year ago

Sometimes running audio triggers the "got infinite logprob" assertion, all audio that triggers this does work in the whisper model from the OpenAI repo. The error occurs in the "may_flush_segment" function

# see GreedyDecoder.update()
chunck_indices = chunk_tokens_nosot + [tokenizer.eot]
assert len(chunk_logprobs) == len(chunck_indices), f"{len(chunk_logprobs)} != {len(chunck_indices)}"
logprobs = [logprob[i] for (logprob, i) in zip(chunk_logprobs, chunck_indices)]
assert min([p.isfinite().item() for p in logprobs]), "Got infinite logprob"

A sample of audio that I could get to reliably reproduce this error was the mp4 from this youtube link -> https://www.youtube.com/watch?v=D9G1VOjN_84 I downloaded the MP4 from here -> https://yt1ss.net/en?q=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DD9G1VOjN_84 (would upload but 10mpbs limit, filesize is 18mb)

The audio was run on the medium model size, with condition_on_previous_text=False and the remaining parameters untouched

Jeronymous commented 1 year ago

Thanks for reporting and reporting a way to reproduce. That seems to be due to "condition_on_previous_text=False". I'll fix it

seanth commented 1 year ago

Here to report the same bug.