m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.63k stars 1.34k forks source link

Running on short audio: KeyError: 'level_1' #49

Closed arnavmehta7 closed 1 year ago

arnavmehta7 commented 1 year ago

Failed to align segment: no characters in this segment found in model dictionary, resorting to original... I was testing to align a audio file, but it didn't worked and give above error. It was a plain English .wav file

m-bain commented 1 year ago

are you using most up to date version? pip install git+https://github.com/m-bain/whisperx.git --upgrade This should just be a warning (when segment only contains numerals/symbols), it will still output transcription with timestamps

arnavmehta7 commented 1 year ago
Failed to align segment ("."): no characters in this segment found in model dictionary, resorting to original...
/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py:294: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
char_segments_arr = per_seg_grp.apply(lambda x: x.reset_index(drop = True)).reset_index()
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'level_1'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/cog/server/worker.py", line 209, in _predict
result = self._predictor.predict(**payload)
File "predict.py", line 126, in predict
return self.generate_aligned_transcription_whisperx(transcript, audio_path)
File "predict.py", line 45, in generate_aligned_transcription_whisperx
return whisperx.align(val['segments'],
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py", line 384, in align
cseg['segment-text-start'] = cseg['level_1']
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/frame.py", line 3807, in __getitem__
indexer = self.columns.get_loc(key)
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'level_1'
arnavmehta7 commented 1 year ago

I am getting above error after updating to latest version.

arnavmehta7 commented 1 year ago

My text definitely is fine, but it is not working :( [{'id': 0, 'seek': 0, 'start': 0.028, 'end': 60.44, 'text': " I wake up at three o'clock in the morning and just drive over here. Sweet. What's your morning routine like? Or do you have a morning routine that you like? Or do you just kind of fasted cardio? This is what I like to do or fasted yoga. That's my most recent thing. So I'm doing either 14 to 16 hours depending upon what my day looks like. So I'm intermittent fasting. And then I usually either like yoga or running in the morning. Today was running and then I'll do something in the afternoon and either martial arts related or weightlifting related.", 'tokens': [286, 6634, 493, 412, 1045, 277, 6, 9023, 294, 264, 2446, 293, 445, 3332, 670, 510, 13, 14653, 13, 708, 311, 428, 2446, 9927, 411, 30, 1610, 360, 291, 362, 257, 2446, 9927, 300, 291, 411, 30, 1610, 360, 291, 445, 733, 295, 2370, 292, 34274, 30, 639, 307, 437, 286, 411, 281, 360, 420, 2370, 292, 15128, 13, 663, 311, 452, 881, 5162, 551, 13, 407, 286, 478, 884, 2139, 3499, 281, 3165, 2496, 5413, 3564, 437, 452, 786, 1542, 411, 13, 407, 286, 478, 44084, 22371, 13, 400, 550, 286, 2673, 2139, 411, 15128, 420, 2614, 294, 264, 2446, 13, 2692, 390, 2614, 293, 550, 286, 603, 360, 746, 294, 264, 6499, 293, 2139, 20755, 8609, 4077, 420, 3364, 34724, 4077, 13], 'temperature': 0.0, 'avg_logprob': -0.18404083251953124, 'compression_ratio': 1.8184818481848184, 'no_speech_prob': 0.07352152466773987}, {'id': 1, 'seek': 3000, 'start': 30.028, 'end': 90.44, 'text': '.', 'tokens': [2411], 'temperature': 1.0, 'avg_logprob': -1.1083998680114746, 'compression_ratio': 0.1111111111111111, 'no_speech_prob': 0.6738296151161194}]

m-bain commented 1 year ago

what is your pandas version? I can't reproduce this error, can you paste the .wav file if its small? is this happening to every audio file or just this one?

arnavmehta7 commented 1 year ago

1.5.3

https://user-images.githubusercontent.com/65492948/215346029-f1571dbe-ce74-4ef3-b2a5-1de584d771b3.mp4 Uploading mp3 wasn't allowed, so I converted mp3 to mp4, please reconvert it.

Reference Code to Reproduce error

content = [{'id': 0, 'seek': 0, 'start': 0.028, 'end': 60.44, 'text': " I wake up at three o'clock in the morning and just drive over here. Sweet. What's your morning routine like? Or do you have a morning routine that you like? Or do you just kind of fasted cardio? This is what I like to do or fasted yoga. That's my most recent thing. So I'm doing either 14 to 16 hours depending upon what my day looks like. So I'm intermittent fasting. And then I usually either like yoga or running in the morning. Today was running and then I'll do something in the afternoon and either martial arts related or weightlifting related.", 'tokens': [286, 6634, 493, 412, 1045, 277, 6, 9023, 294, 264, 2446, 293, 445, 3332, 670, 510, 13, 14653, 13, 708, 311, 428, 2446, 9927, 411, 30, 1610, 360, 291, 362, 257, 2446, 9927, 300, 291, 411, 30, 1610, 360, 291, 445, 733, 295, 2370, 292, 34274, 30, 639, 307, 437, 286, 411, 281, 360, 420, 2370, 292, 15128, 13, 663, 311, 452, 881, 5162, 551, 13, 407, 286, 478, 884, 2139, 3499, 281, 3165, 2496, 5413, 3564, 437, 452, 786, 1542, 411, 13, 407, 286, 478, 44084, 22371, 13, 400, 550, 286, 2673, 2139, 411, 15128, 420, 2614, 294, 264, 2446, 13, 2692, 390, 2614, 293, 550, 286, 603, 360, 746, 294, 264, 6499, 293, 2139, 20755, 8609, 4077, 420, 3364, 34724, 4077, 13], 'temperature': 0.0, 'avg_logprob': -0.18404083251953124, 'compression_ratio': 1.8184818481848184, 'no_speech_prob': 0.07352152466773987}, {'id': 1, 'seek': 3000, 'start': 30.028, 'end': 90.44, 'text': '.', 'tokens': [2411], 'temperature': 1.0, 'avg_logprob': -1.1083998680114746, 'compression_ratio': 0.1111111111111111, 'no_speech_prob': 0.6738296151161194}]

result_aligned = whisperx.align(content, model_a, metadata, audio_file, device)
arnavmehta7 commented 1 year ago

Somehow there is a fullstop in the end I don't know why it produced that...

arnavmehta7 commented 1 year ago

@m-bain I found the error

Failed to align segment ("."): no characters in this segment found in model dictionary, resorting to original...
/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py:294: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
char_segments_arr = per_seg_grp.apply(lambda x: x.reset_index(drop = True)).reset_index()
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'level_1'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/cog/server/worker.py", line 209, in _predict
result = self._predictor.predict(**payload)
File "predict.py", line 126, in predict
return self.generate_aligned_transcription_whisperx(transcript, audio_path)
File "predict.py", line 45, in generate_aligned_transcription_whisperx
return whisperx.align(val['segments'],
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py", line 384, in align
cseg['segment-text-start'] = cseg['level_1']
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/frame.py", line 3807, in __getitem__
indexer = self.columns.get_loc(key)
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'level_1'

@m-bain Seems like the whispers failes to align with punctuations. Posible fix could be to not consider them maybe

arnavmehta7 commented 1 year ago

@m-bain Apparentally the pandas error remains for other audio samples files as well on latest version


/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py:294: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
char_segments_arr = per_seg_grp.apply(lambda x: x.reset_index(drop = True)).reset_index()
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'level_1'```
arnavmehta7 commented 1 year ago

@m-bain can you also please tell me when was the key "word-level" present in the code? lately I also get the word-level error. as I had been using that key

sbuser commented 1 year ago

For the record: I also get this KeyError after updating to latest. It's not just his audio or install.

sbuser commented 1 year ago
per_seg_grp.head()
segment-idx  subsegment-idx  word-idx char     start       end     score

char_segments_arr.head()
   index  segment-idx  subsegment-idx  word-idx char     start       end     score
0      0            0               0         0            NaN       NaN       NaN

per_word_grp.head()
    index  segment-idx  subsegment-idx  word-idx char     start       end     score
1       1            0               0         0    N  0.241611  0.402685  0.999866
2       2            0               0         0    o  0.402685  0.422819  0.999049

I didn't send all of the data - but this is the general structure of the DFs when it goes looking for per_word_grp["level_1"] and doesn't find it

Don't be alarmed by the NaN and etc - there is data further down in these frames.

sbuser commented 1 year ago

I can't see where this index is supposed to come from unless there's a missing call to reset_index() somewhere? Or my base DF from which this one is being constructed are somehow missing indices? But I'm not really following what you're trying to do here. Still, if you can't reproduce on your end, I can provide info from debugger if that's helpful.

I am also on pandas 1.5.3 - I notice there's no locked version in the requirements.txt in the repo. What version are you on?

sbuser commented 1 year ago

Ok - I believe this is happening as a result of there being only a single segment in an audio file. It works fine on longer files where whisper is returning multiple segments. When a single segment is returned, the expected indexes aren't there.

I'm not familiar enough with pandas to understand how to resolve it right now, but if you take a single sentence audio file you may be able to reproduce.

arnavmehta7 commented 1 year ago

I also had posted this pandas error yesterday as well :\

m-bain commented 1 year ago

@sbuser amazing thank you for the find! I will look into a fix when i have time

arnavmehta7 commented 1 year ago

To fix my issues, i just reverted to a older checkpoint from the stable branch

arnavmehta7 commented 1 year ago

@m-bain amazing work though!

puresky07 commented 1 year ago

@m-bain Thank you so much for this amazing work! I had the same issue of @arnavmehta7. I hope it won't be to much troubles for you to fix it. @arnavmehta7 to which commit did you go back to get it working?

arnavmehta7 commented 1 year ago

@puresky07 https://github.com/m-bain/whisperX/commit/ba102feb7ff30e6f8345f00470955f5632e767e2 this one

puresky07 commented 1 year ago

@arnavmehta7 Thank you it does work much better indeed! And it seems I have less timestamps incoherence too (issue 56).

arnavmehta7 commented 1 year ago

@m-bain Hey, the audio file here is 60seconds not 3seconds, but still the error exists... Could you tell me a way to fix it?