Open maxkvbn opened 6 days ago
transformers
No response
examples
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[8], [line 1](vscode-notebook-cell:?execution_count=8&line=1) ----> [1](vscode-notebook-cell:?execution_count=8&line=1) asr_out = transcribing_pipe( [2](vscode-notebook-cell:?execution_count=8&line=2) '../SampleData/Saba_interview_short.wav', [3](vscode-notebook-cell:?execution_count=8&line=3) return_timestamps="word", [4](vscode-notebook-cell:?execution_count=8&line=4) generate_kwargs={"language": "danish"} [5](vscode-notebook-cell:?execution_count=8&line=5) ) [7](vscode-notebook-cell:?execution_count=8&line=7) asr_out File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\pipelines\automatic_speech_recognition.py:284, in AutomaticSpeechRecognitionPipeline.__call__(self, inputs, **kwargs) [221](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:221) def __call__( [222](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:222) self, [223](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:223) inputs: Union[np.ndarray, bytes, str], [224](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:224) **kwargs, [225](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:225) ): [226](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:226) """ [227](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:227) Transcribe the audio sequence(s) given as inputs to text. See the [`AutomaticSpeechRecognitionPipeline`] [228](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:228) documentation for more information. (...) [282](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:282) `"".join(chunk["text"] for chunk in output["chunks"])`. [283](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:283) """ --> [284](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:284) return super().__call__(inputs, **kwargs) File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\pipelines\base.py:1246, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs) [1244](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1244) return self.iterate(inputs, preprocess_params, forward_params, postprocess_params) [1245](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1245) elif self.framework == "pt" and isinstance(self, ChunkPipeline): -> [1246](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1246) return next( [1247](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1247) iter( [1248](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1248) self.get_iterator( [1249](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1249) [inputs], num_workers, batch_size, preprocess_params, forward_params, postprocess_params [1250](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1250) ) [1251](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1251) ) [1252](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1252) ) [1253](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1253) else: [1254](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/base.py:1254) return self.run_single(inputs, preprocess_params, forward_params, postprocess_params) File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\pipelines\pt_utils.py:125, in PipelineIterator.__next__(self) [123](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:123) # We're out of items within a batch [124](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:124) item = next(self.iterator) --> [125](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:125) processed = self.infer(item, **self.params) [126](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:126) # We now have a batch of "inferred things". [127](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:127) if self.loader_batch_size is not None: [128](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/pt_utils.py:128) # Try to infer the size of the batch File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\pipelines\automatic_speech_recognition.py:587, in AutomaticSpeechRecognitionPipeline.postprocess(self, model_outputs, decoder_kwargs, return_timestamps, return_language) [584](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:584) stride_right /= sampling_rate [585](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:585) output["stride"] = chunk_len, stride_left, stride_right --> [587](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:587) text, optional = self.tokenizer._decode_asr( [588](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:588) model_outputs, [589](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:589) return_timestamps=return_timestamps, [590](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:590) return_language=return_language, [591](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:591) time_precision=time_precision, [592](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:592) ) [593](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:593) else: [594](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/pipelines/automatic_speech_recognition.py:594) items = np.concatenate(final_items, axis=1) File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\models\whisper\tokenization_whisper.py:832, in WhisperTokenizer._decode_asr(self, model_outputs, return_timestamps, return_language, time_precision) [831](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:831) def _decode_asr(self, model_outputs, *, return_timestamps, return_language, time_precision): --> [832](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:832) return _decode_asr( [833](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:833) self, [834](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:834) model_outputs, [835](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:835) return_timestamps=return_timestamps, [836](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:836) return_language=return_language, [837](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:837) time_precision=time_precision, [838](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:838) ) File c:\Users\User\miniconda3\envs\vva\lib\site-packages\transformers\models\whisper\tokenization_whisper.py:1032, in _decode_asr(tokenizer, model_outputs, return_timestamps, return_language, time_precision) [1030](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1030) current_tokens.append(token) [1031](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1031) if return_timestamps == "word": -> [1032](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1032) start_time = round(token_timestamps[i] + time_offset, 2) [1033](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1033) if i + 1 < len(token_timestamps): [1034](file:///C:/Users/User/miniconda3/envs/vva/lib/site-packages/transformers/models/whisper/tokenization_whisper.py:1034) end_time = round(token_timestamps[i + 1] + time_offset, 2) IndexError: list index out of range
I've uploaded the audio that I am trying to process here
I have a discussion on the whisper's hub page, where I have implemented a fix that seems to work. here
Colab link here
cc @sanchit-gandhi @kamilakesbi
System Info
transformers
version: 4.42.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I've uploaded the audio that I am trying to process here
I have a discussion on the whisper's hub page, where I have implemented a fix that seems to work. here
Colab link here
Expected behavior