Closed AmirHussein96 closed 1 year ago
Could you please press
ctrl + c
and post the error message?
2022-12-30 10:26:28,100 INFO [streaming_server.py:296] Using device: cuda:0
2022-12-30 10:26:33,959 INFO [streaming_server.py:380] Warmup start
^C2022-12-30 10:27:04,352 ERROR [base_events.py:1707] Task exception was never retrieved
future: <Task finished name='Task-2' coro=<StreamingServer.stream_consumer_task() done, defined at ./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py:396> exception=Error('The following operation failed in the TorchScript interpreter.\nTraceback of TorchScript, serialized code (most recent call last):\n File "code/__torch__/joiner.py", line 22, in forward\n pass\n else:\n ops.prim.RaiseException("AssertionError: ")\n ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE\n _3 = torch.slice(torch.size(encoder_out), None, -1)\n _4 = torch.slice(torch.size(decoder_out), None, -1)\n\nTraceback of TorchScript, original code (most recent call last):\n File "/alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2/ASR1/pruned_transducer_stateless5/joiner.py", line 55, in forward\n Return a tensor of shape (N, T, s_range, C).\n """\n assert encoder_out.ndim == decoder_out.ndim == 4\n ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE\n assert encoder_out.shape[:-1] == decoder_out.shape[:-1]\n \nRuntimeError: AssertionError: \n')>
Traceback (most recent call last):
File "./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py", line 421, in stream_consumer_task
await loop.run_in_executor(
File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/speech/toolkits/sherpa/sherpa/bin/streaming_pruned_transducer_statelessX/beam_search.py", line 347, in process
) = streaming_greedy_search(
torch.jit.Error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/joiner.py", line 22, in forward
pass
else:
ops.prim.RaiseException("AssertionError: ")
~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_3 = torch.slice(torch.size(encoder_out), None, -1)
_4 = torch.slice(torch.size(decoder_out), None, -1)
Traceback of TorchScript, original code (most recent call last):
File "/alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2/ASR1/pruned_transducer_stateless5/joiner.py", line 55, in forward
Return a tensor of shape (N, T, s_range, C).
"""
assert encoder_out.ndim == decoder_out.ndim == 4
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
assert encoder_out.shape[:-1] == decoder_out.shape[:-1]
RuntimeError: AssertionError:
Traceback (most recent call last):
File "./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py", line 742, in <module>
main()
File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py", line 716, in main
asyncio.run(server.run(port))
File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
self.run_forever()
File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
self._run_once()
File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/asyncio/base_events.py", line 1823, in _run_once
event_list = self._selector.select(timeout)
File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/selectors.py", line 468, in select
fd_event_list = self._selector.poll(timeout, max_ev)
KeyboardInterrupt
Could you please delete the assert statement, re-export your model and try again?
@csukuangfj can you please elaborate which assert statement you mean?
Please see the above error message you just posted.
It is from joiner.py
Thank you it is working now. I recorded small demo using my mic here check it https://youtu.be/2uh3zVAFyQ4
Thanks! I will update the documentation to add a link to the video.
Would you mind also recording a video showing how endpointing works? (You only need to pause for a while, e.g., 2 seconds, before you say the next sentence.)
Thanks @csukuangfj , here is the video with pauses https://youtu.be/t2SlrzgMd_k. BTW I noticed that the streaming pruned stateless transducer model trained on MGB-2 is very robust to noise and background music but when I switch to English the model totally ignores my speech, It only transcribes when I speak in Modern Standard Arabic. Comparing this to Espnet blockwise streaming transformer/conformer trained on MGB-2, which is more sensitive to noise and music but it also transcribes everything that I say even when I switch completely to English it still uses Arabic scripts to transcribe it. Is there any reason why the streaming pruned stateless transducer ignores my English speech? How can I make it more sensitive to any words being said?
This is the command I am using: ./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py --decoding-method fast_beam_search --decode-left-context 32 --decode-chunk-size 16 --lang-dir data/lang_bpe_2000 --endpoint.rule3.min-utterance-length 1000.0 --port 6006 --max-batch-size 50 --max-wait-ms 5 --nn-pool-size 1 --nn-model-filename ./mgb2/exp/cpu_jit.pt --bpe-model-filename ./mgb2/data/lang_bpe_2000/bpe.model
here is the video with pauses https://youtu.be/t2SlrzgMd_k.
Thanks!
Is there any reason why the streaming pruned stateless transducer ignores my English speech? How can I make it more sensitive to any words being said?
Sorry, I don't have any suggestions for that.
By the way, the Arabic endpointing demo has been added to
https://k2-fsa.github.io/sherpa/python/streaming_asr/endpointing.html#endpointing-demo-arabic
Awesome thank you @csukuangfj
Hi @csukuangfj I tried sherpa with mgb2 streaming transducer from here https://github.com/k2-fsa/icefall/tree/master/egs/mgb2/ASR/pruned_transducer_stateless5. I used https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/export.py to generate the jit model with the following command
python pruned_trancducer_stateless5/export.py --streaming-model 1 --causal-convolution 1 --jit 1 --epoch 18 --avg 5 --bpe-model data/lang_bpe_2000/bpe.model
. It takes forever when I run (I waited for 2 hours and it is still stuck at warmup, see the screenshot)/sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py --lang-dir data/lang_bpe_2000 --endpoint.rule3.min-utterance-length 1000.0 --port 6006 --max-batch-size 30 --max-wait-ms 5 --nn-pool-size 1 --nn-model-filename ./mgb2/exp/cpu_jit.pt --bpe-model-filename ./mgb2/data/lang_bpe_2000/bpe.model
Could this be because I am using a larger bpe=2000? I tried your model from here https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/conformer_rnnt_for_English/server.html and it worked perfectly fine. Any ideas?