Closed arnavmehta7 closed 1 year ago
are you using most up to date version?
pip install git+https://github.com/m-bain/whisperx.git --upgrade
This should just be a warning (when segment only contains numerals/symbols), it will still output transcription with timestamps
Failed to align segment ("."): no characters in this segment found in model dictionary, resorting to original...
/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py:294: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
char_segments_arr = per_seg_grp.apply(lambda x: x.reset_index(drop = True)).reset_index()
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'level_1'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/cog/server/worker.py", line 209, in _predict
result = self._predictor.predict(**payload)
File "predict.py", line 126, in predict
return self.generate_aligned_transcription_whisperx(transcript, audio_path)
File "predict.py", line 45, in generate_aligned_transcription_whisperx
return whisperx.align(val['segments'],
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py", line 384, in align
cseg['segment-text-start'] = cseg['level_1']
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/frame.py", line 3807, in __getitem__
indexer = self.columns.get_loc(key)
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'level_1'
I am getting above error after updating to latest version.
My text definitely is fine, but it is not working :(
[{'id': 0, 'seek': 0, 'start': 0.028, 'end': 60.44, 'text': " I wake up at three o'clock in the morning and just drive over here. Sweet. What's your morning routine like? Or do you have a morning routine that you like? Or do you just kind of fasted cardio? This is what I like to do or fasted yoga. That's my most recent thing. So I'm doing either 14 to 16 hours depending upon what my day looks like. So I'm intermittent fasting. And then I usually either like yoga or running in the morning. Today was running and then I'll do something in the afternoon and either martial arts related or weightlifting related.", 'tokens': [286, 6634, 493, 412, 1045, 277, 6, 9023, 294, 264, 2446, 293, 445, 3332, 670, 510, 13, 14653, 13, 708, 311, 428, 2446, 9927, 411, 30, 1610, 360, 291, 362, 257, 2446, 9927, 300, 291, 411, 30, 1610, 360, 291, 445, 733, 295, 2370, 292, 34274, 30, 639, 307, 437, 286, 411, 281, 360, 420, 2370, 292, 15128, 13, 663, 311, 452, 881, 5162, 551, 13, 407, 286, 478, 884, 2139, 3499, 281, 3165, 2496, 5413, 3564, 437, 452, 786, 1542, 411, 13, 407, 286, 478, 44084, 22371, 13, 400, 550, 286, 2673, 2139, 411, 15128, 420, 2614, 294, 264, 2446, 13, 2692, 390, 2614, 293, 550, 286, 603, 360, 746, 294, 264, 6499, 293, 2139, 20755, 8609, 4077, 420, 3364, 34724, 4077, 13], 'temperature': 0.0, 'avg_logprob': -0.18404083251953124, 'compression_ratio': 1.8184818481848184, 'no_speech_prob': 0.07352152466773987}, {'id': 1, 'seek': 3000, 'start': 30.028, 'end': 90.44, 'text': '.', 'tokens': [2411], 'temperature': 1.0, 'avg_logprob': -1.1083998680114746, 'compression_ratio': 0.1111111111111111, 'no_speech_prob': 0.6738296151161194}]
what is your pandas version? I can't reproduce this error, can you paste the .wav file if its small? is this happening to every audio file or just this one?
1.5.3
https://user-images.githubusercontent.com/65492948/215346029-f1571dbe-ce74-4ef3-b2a5-1de584d771b3.mp4 Uploading mp3 wasn't allowed, so I converted mp3 to mp4, please reconvert it.
Reference Code to Reproduce error
content = [{'id': 0, 'seek': 0, 'start': 0.028, 'end': 60.44, 'text': " I wake up at three o'clock in the morning and just drive over here. Sweet. What's your morning routine like? Or do you have a morning routine that you like? Or do you just kind of fasted cardio? This is what I like to do or fasted yoga. That's my most recent thing. So I'm doing either 14 to 16 hours depending upon what my day looks like. So I'm intermittent fasting. And then I usually either like yoga or running in the morning. Today was running and then I'll do something in the afternoon and either martial arts related or weightlifting related.", 'tokens': [286, 6634, 493, 412, 1045, 277, 6, 9023, 294, 264, 2446, 293, 445, 3332, 670, 510, 13, 14653, 13, 708, 311, 428, 2446, 9927, 411, 30, 1610, 360, 291, 362, 257, 2446, 9927, 300, 291, 411, 30, 1610, 360, 291, 445, 733, 295, 2370, 292, 34274, 30, 639, 307, 437, 286, 411, 281, 360, 420, 2370, 292, 15128, 13, 663, 311, 452, 881, 5162, 551, 13, 407, 286, 478, 884, 2139, 3499, 281, 3165, 2496, 5413, 3564, 437, 452, 786, 1542, 411, 13, 407, 286, 478, 44084, 22371, 13, 400, 550, 286, 2673, 2139, 411, 15128, 420, 2614, 294, 264, 2446, 13, 2692, 390, 2614, 293, 550, 286, 603, 360, 746, 294, 264, 6499, 293, 2139, 20755, 8609, 4077, 420, 3364, 34724, 4077, 13], 'temperature': 0.0, 'avg_logprob': -0.18404083251953124, 'compression_ratio': 1.8184818481848184, 'no_speech_prob': 0.07352152466773987}, {'id': 1, 'seek': 3000, 'start': 30.028, 'end': 90.44, 'text': '.', 'tokens': [2411], 'temperature': 1.0, 'avg_logprob': -1.1083998680114746, 'compression_ratio': 0.1111111111111111, 'no_speech_prob': 0.6738296151161194}]
result_aligned = whisperx.align(content, model_a, metadata, audio_file, device)
Somehow there is a fullstop in the end I don't know why it produced that...
@m-bain I found the error
Failed to align segment ("."): no characters in this segment found in model dictionary, resorting to original... /root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py:294: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object. To preserve the previous behavior, use >>> .groupby(..., group_keys=False) To adopt the future behavior and silence this warning, use >>> .groupby(..., group_keys=True) char_segments_arr = per_seg_grp.apply(lambda x: x.reset_index(drop = True)).reset_index() Traceback (most recent call last): File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'level_1' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/cog/server/worker.py", line 209, in _predict result = self._predictor.predict(**payload) File "predict.py", line 126, in predict return self.generate_aligned_transcription_whisperx(transcript, audio_path) File "predict.py", line 45, in generate_aligned_transcription_whisperx return whisperx.align(val['segments'], File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py", line 384, in align cseg['segment-text-start'] = cseg['level_1'] File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/frame.py", line 3807, in __getitem__ indexer = self.columns.get_loc(key) File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc raise KeyError(key) from err KeyError: 'level_1'
@m-bain Seems like the whispers failes to align with punctuations. Posible fix could be to not consider them maybe
@m-bain Apparentally the pandas error remains for other audio samples files as well on latest version
/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/whisperx/alignment.py:294: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use
>>> .groupby(..., group_keys=False)
To adopt the future behavior and silence this warning, use
>>> .groupby(..., group_keys=True)
char_segments_arr = per_seg_grp.apply(lambda x: x.reset_index(drop = True)).reset_index()
Traceback (most recent call last):
File "/root/.pyenv/versions/3.9.16/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'level_1'```
@m-bain can you also please tell me when was the key "word-level" present in the code? lately I also get the word-level error. as I had been using that key
For the record: I also get this KeyError after updating to latest. It's not just his audio or install.
per_seg_grp.head()
segment-idx subsegment-idx word-idx char start end score
char_segments_arr.head()
index segment-idx subsegment-idx word-idx char start end score
0 0 0 0 0 NaN NaN NaN
per_word_grp.head()
index segment-idx subsegment-idx word-idx char start end score
1 1 0 0 0 N 0.241611 0.402685 0.999866
2 2 0 0 0 o 0.402685 0.422819 0.999049
I didn't send all of the data - but this is the general structure of the DFs when it goes looking for per_word_grp["level_1"]
and doesn't find it
Don't be alarmed by the NaN and etc - there is data further down in these frames.
I can't see where this index is supposed to come from unless there's a missing call to reset_index() somewhere? Or my base DF from which this one is being constructed are somehow missing indices? But I'm not really following what you're trying to do here. Still, if you can't reproduce on your end, I can provide info from debugger if that's helpful.
I am also on pandas 1.5.3 - I notice there's no locked version in the requirements.txt in the repo. What version are you on?
Ok - I believe this is happening as a result of there being only a single segment in an audio file. It works fine on longer files where whisper is returning multiple segments. When a single segment is returned, the expected indexes aren't there.
I'm not familiar enough with pandas to understand how to resolve it right now, but if you take a single sentence audio file you may be able to reproduce.
I also had posted this pandas error yesterday as well :\
@sbuser amazing thank you for the find! I will look into a fix when i have time
To fix my issues, i just reverted to a older checkpoint from the stable branch
@m-bain amazing work though!
@m-bain Thank you so much for this amazing work! I had the same issue of @arnavmehta7. I hope it won't be to much troubles for you to fix it. @arnavmehta7 to which commit did you go back to get it working?
@arnavmehta7 Thank you it does work much better indeed! And it seems I have less timestamps incoherence too (issue 56).
@m-bain Hey, the audio file here is 60seconds not 3seconds, but still the error exists... Could you tell me a way to fix it?
Failed to align segment: no characters in this segment found in model dictionary, resorting to original...
I was testing to align a audio file, but it didn't worked and give above error. It was a plain English .wav file