An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
MIT License
310
stars
31
forks
source link
'RuntimeError: stft input and window must be on the same device but got self on cuda:1 and window on cuda:0' when specify "device_index = 1" of "whisper_s2t.load_model" #56
I have 4 Tesla V100s when I specify the device_index =1 , the "transcribe_with_vad" method executes incorrectly.
here is my code :
model = whisper_s2t.load_model( model_identifier="large-v2", backend="CTranslate2", compute_type="int8", device_index=1, )
here is the error log:
`
File "/home/asr/code/src/asr_server/modules/transcribe/test.py", line 34, in test
out, infos = model.transcribe_with_vad(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/backends/init.py", line 215, in transcribe_with_vad
for (
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/data.py", line 264, in get_data_loader_with_vad
start_ends, audio_signal, audio_duration = self.speech_segmenter(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/init.py", line 148, in call
speech_probs = self.vad_model(audio_signal)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/frame_vad.py", line 128, in call
speech_probs = self.forward(input_signal, input_signal_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/frame_vad.py", line 104, in forward
x, x_len = self.vad_pp(input_signal_pt, input_signal_length_pt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/preprocessor/___torch_mangle_6.py", line 23, in forward
_9 = [torch.size(input0, 1), torch.size(input0, 2)]
input1 = torch.view(input0, _9)
x = torch.stft(input1, 512, 160, 400, CONSTANTS.c4, False, None, True)
I have 4 Tesla V100s when I specify the device_index =1 , the "transcribe_with_vad" method executes incorrectly.
here is my code :
model = whisper_s2t.load_model( model_identifier="large-v2", backend="CTranslate2", compute_type="int8", device_index=1, )
here is the error log:
`
File "/home/asr/code/src/asr_server/modules/transcribe/test.py", line 34, in test out, infos = model.transcribe_with_vad( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/backends/init.py", line 215, in transcribe_with_vad for ( File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/data.py", line 264, in get_data_loader_with_vad start_ends, audio_signal, audio_duration = self.speech_segmenter( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/init.py", line 148, in call speech_probs = self.vad_model(audio_signal) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/frame_vad.py", line 128, in call speech_probs = self.forward(input_signal, input_signal_length) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/frame_vad.py", line 104, in forward x, x_len = self.vad_pp(input_signal_pt, input_signal_length_pt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/preprocessor/___torch_mangle_6.py", line 23, in forward _9 = [torch.size(input0, 1), torch.size(input0, 2)] input1 = torch.view(input0, _9) x = torch.stft(input1, 512, 160, 400, CONSTANTS.c4, False, None, True)