Open GrahLnn opened 2 days ago
Hey,
the longform logic is something we will work on next since the transformers implementation is not ideal for our model.
However, hopefully for a quick fix you can try to install our custom fork and see if this fixes your problem:
pip install git+https://github.com/nyrahealth/transformers.git@crisper_whisper
If this does not do it let me know and we look into it further together.
Best,
Laurin
Thank you for your help, now there is a new error.
Traceback (most recent call last):
File "C:\Users\grahl\criwhisper\test.py", line 76, in <module>
res = transcribe_audio(
^^^^^^^^^^^^^^^^^
File "C:\Users\grahl\criwhisper\test.py", line 71, in transcribe_audio
result = pipe(file_path)
^^^^^^^^^^^^^^^
File "C:\Users\grahl\criwhisper\.venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 292, in __call__
return super().__call__(inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\grahl\criwhisper\.venv\Lib\site-packages\transformers\pipelines\base.py", line 1154, in __call__
return next(
^^^^^
File "C:\Users\grahl\criwhisper\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__
item = next(self.iterator)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\grahl\criwhisper\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 266, in __next__
processed = self.infer(next(self.iterator), **self.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\grahl\criwhisper\.venv\Lib\site-packages\transformers\pipelines\base.py", line 1068, in forward
model_outputs = self._forward(model_inputs, **forward_params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\grahl\criwhisper\.venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 507, in _forward
tokens = self.model.generate(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\grahl\criwhisper\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 624, in generate
outputs["token_timestamps"] = self._extract_token_timestamps(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\grahl\criwhisper\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 316, in _extract_token_timestamps
timestamps[batch_idx, 1:] = torch.tensor(jump_times)
~~~~~~~~~~^^^^^^^^^^^^^^^
RuntimeError: The expanded size of the tensor (4) must match the existing size (5) at non-singleton dimension 0. Target sizes: [4]. Tensor sizes: [5]
I tried to transcribe an hour-long audio, but I got this error. I had good results with a two-minute task attempt, so I wanted to try the long audio. Is there any way to fix it? Thank you.
and error