shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
MIT License
310 stars 31 forks source link

'RuntimeError: stft input and window must be on the same device but got self on cuda:1 and window on cuda:0' when specify "device_index = 1" of "whisper_s2t.load_model" #56

Open JH90iOS opened 7 months ago

JH90iOS commented 7 months ago

I have 4 Tesla V100s when I specify the device_index =1 , the "transcribe_with_vad" method executes incorrectly.

here is my code : model = whisper_s2t.load_model( model_identifier="large-v2", backend="CTranslate2", compute_type="int8", device_index=1, )

here is the error log:

`
File "/home/asr/code/src/asr_server/modules/transcribe/test.py", line 34, in test out, infos = model.transcribe_with_vad( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/backends/init.py", line 215, in transcribe_with_vad for ( File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/data.py", line 264, in get_data_loader_with_vad start_ends, audio_signal, audio_duration = self.speech_segmenter( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/init.py", line 148, in call speech_probs = self.vad_model(audio_signal) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/frame_vad.py", line 128, in call speech_probs = self.forward(input_signal, input_signal_length) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/asr/code/src/asr_server/modules/transcribe/whisper_s2t/speech_segmenter/frame_vad.py", line 104, in forward x, x_len = self.vad_pp(input_signal_pt, input_signal_length_pt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/virtualenvs/asr-venv-py3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/preprocessor/___torch_mangle_6.py", line 23, in forward _9 = [torch.size(input0, 1), torch.size(input0, 2)] input1 = torch.view(input0, _9) x = torch.stft(input1, 512, 160, 400, CONSTANTS.c4, False, None, True)


    x0 = torch.view_as_real(x)
    x1 = torch.sqrt(torch.sum(torch.pow(x0, 2), [-1]))

Traceback of TorchScript, original code (most recent call last):
/usr/local/lib/python3.10/dist-packages/torch/functional.py(650): stft
/content/preprocessor.py(79): forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/content/preprocessor.py(252): forward
/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py(16): decorate_autocast
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/content/preprocessor.py(448): forward
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1508): _slow_forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py(1065): trace_module
/content/preprocessor.py(463): export
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
<ipython-input-3-06f36c3d3277>(17): <cell line: 17>
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(3553): run_code
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(3473): run_ast_nodes
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(3257): run_cell_async
/usr/local/lib/python3.10/dist-packages/IPython/core/async_helpers.py(78): _pseudo_sync_runner
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(3030): _run_cell
/usr/local/lib/python3.10/dist-packages/IPython/core/interactiveshell.py(2975): run_cell
/usr/local/lib/python3.10/dist-packages/ipykernel/zmqshell.py(539): run_cell
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py(302): do_execute
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(234): wrapper
/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py(539): execute_request
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(234): wrapper
/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py(261): dispatch_shell
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(234): wrapper
/usr/local/lib/python3.10/dist-packages/ipykernel/kernelbase.py(361): process_one
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(786): run
/usr/local/lib/python3.10/dist-packages/tornado/gen.py(825): inner
/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py(738): _run_callback
/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py(685): <lambda>
/usr/lib/python3.10/asyncio/events.py(80): _run
/usr/lib/python3.10/asyncio/base_events.py(1909): _run_once
/usr/lib/python3.10/asyncio/base_events.py(603): run_forever
/usr/local/lib/python3.10/dist-packages/tornado/platform/asyncio.py(195): start
/usr/local/lib/python3.10/dist-packages/ipykernel/kernelapp.py(619): start
/usr/local/lib/python3.10/dist-packages/traitlets/config/application.py(992): launch_instance
/usr/local/lib/python3.10/dist-packages/colab_kernel_launcher.py(37): <module>
/usr/lib/python3.10/runpy.py(86): _run_code
/usr/lib/python3.10/runpy.py(196): _run_module_as_main
RuntimeError: stft input and window must be on the same device but got self on cuda:1 and window on cuda:0

`
JH90iOS commented 7 months ago

I suppose the problem is in the default VAD implementation ,because when I transcribe without vad ,it workers well