ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
https://ictnlp.github.io/StreamSpeech-site/
MIT License
965 stars 73 forks source link

RuntimeError: Input tensor has to be 2D. - When using Web GUI demo with own audio(.mp3) #14

Open Ztfrederickzheng opened 3 months ago

Ztfrederickzheng commented 3 months ago

INFO:werkzeug:127.0.0.1 - - [09/Aug/2024 14:15:00] "POST /upload HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [09/Aug/2024 14:15:00] "GET /uploads/testing2.MP3?latency=320 HTTP/1.1" 500 - Traceback (most recent call last): File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/flask/app.py", line 1498, in call return self.wsgi_app(environ, start_response) File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/flask/app.py", line 1476, in wsgi_app response = self.handle_exception(e) File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/flask/app.py", line 1473, in wsgi_app response = self.full_dispatch_request() File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/flask/app.py", line 882, in full_dispatch_request rv = self.handle_user_exception(e) File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(view_args) # type: ignore[no-any-return] File "/home/zheng/fuchengzheng/steamspeech/StreamSpeech/demo/app.py", line 909, in uploaded_file run(path) File "/home/zheng/fuchengzheng/steamspeech/StreamSpeech/demo/app.py", line 836, in run action=agent.policy() File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, *kwargs) File "/home/zheng/fuchengzheng/steamspeech/StreamSpeech/demo/app.py", line 468, in policy feature = self.feature_extractor(self.states.source) File "/home/zheng/fuchengzheng/steamspeech/StreamSpeech/demo/app.py", line 100, in call waveform, sample_rate = convert_waveform( File "/home/zheng/fuchengzheng/steamspeech/StreamSpeech/fairseq/fairseq/data/audio/audio_utils.py", line 60, in convert_waveform converted, converted_sample_rate = ta_sox.apply_effects_tensor( File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/torchaudio/sox_effects/sox_effects.py", line 156, in apply_effects_tensor return sox_ext.apply_effects_tensor(tensor, sample_rate, effects, channels_first) File "/home/zheng/anaconda3/envs/streamspeech/lib/python3.10/site-packages/torch/ops.py", line 1061, in call return self._op(args, (kwargs or {})) RuntimeError: Input tensor has to be 2D.

thetushargoyal commented 3 months ago

hey, were you able to solve this issue?

annalenahansen commented 1 month ago

I fixed it by converting to mono using ffmpeg ffmpeg -i output.wav -ar 16000 -ac 1 output_mono_16khz.wav