k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
553 stars 109 forks source link

Warmup takes forever #259

Closed AmirHussein96 closed 1 year ago

AmirHussein96 commented 1 year ago

Hi @csukuangfj I tried sherpa with mgb2 streaming transducer from here https://github.com/k2-fsa/icefall/tree/master/egs/mgb2/ASR/pruned_transducer_stateless5. I used https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless4/export.py to generate the jit model with the following command python pruned_trancducer_stateless5/export.py --streaming-model 1 --causal-convolution 1 --jit 1 --epoch 18 --avg 5 --bpe-model data/lang_bpe_2000/bpe.model. It takes forever when I run (I waited for 2 hours and it is still stuck at warmup, see the screenshot) /sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py --lang-dir data/lang_bpe_2000 --endpoint.rule3.min-utterance-length 1000.0 --port 6006 --max-batch-size 30 --max-wait-ms 5 --nn-pool-size 1 --nn-model-filename ./mgb2/exp/cpu_jit.pt --bpe-model-filename ./mgb2/data/lang_bpe_2000/bpe.model

Screenshot from 2022-12-30 10-01-48

Could this be because I am using a larger bpe=2000? I tried your model from here https://k2-fsa.github.io/sherpa/python/streaming_asr/conformer/conformer_rnnt_for_English/server.html and it worked perfectly fine. Any ideas?

csukuangfj commented 1 year ago

Could you please press

ctrl + c

and post the error message?

AmirHussein96 commented 1 year ago
2022-12-30 10:26:28,100 INFO [streaming_server.py:296] Using device: cuda:0
2022-12-30 10:26:33,959 INFO [streaming_server.py:380] Warmup start
^C2022-12-30 10:27:04,352 ERROR [base_events.py:1707] Task exception was never retrieved
future: <Task finished name='Task-2' coro=<StreamingServer.stream_consumer_task() done, defined at ./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py:396> exception=Error('The following operation failed in the TorchScript interpreter.\nTraceback of TorchScript, serialized code (most recent call last):\n  File "code/__torch__/joiner.py", line 22, in forward\n      pass\n    else:\n      ops.prim.RaiseException("AssertionError: ")\n      ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE\n    _3 = torch.slice(torch.size(encoder_out), None, -1)\n    _4 = torch.slice(torch.size(decoder_out), None, -1)\n\nTraceback of TorchScript, original code (most recent call last):\n  File "/alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2/ASR1/pruned_transducer_stateless5/joiner.py", line 55, in forward\n          Return a tensor of shape (N, T, s_range, C).\n        """\n        assert encoder_out.ndim == decoder_out.ndim == 4\n        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE\n        assert encoder_out.shape[:-1] == decoder_out.shape[:-1]\n    \nRuntimeError: AssertionError: \n')>
Traceback (most recent call last):
  File "./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py", line 421, in stream_consumer_task
    await loop.run_in_executor(
  File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/speech/toolkits/sherpa/sherpa/bin/streaming_pruned_transducer_statelessX/beam_search.py", line 347, in process
    ) = streaming_greedy_search(
torch.jit.Error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/joiner.py", line 22, in forward
      pass
    else:
      ops.prim.RaiseException("AssertionError: ")
      ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _3 = torch.slice(torch.size(encoder_out), None, -1)
    _4 = torch.slice(torch.size(decoder_out), None, -1)

Traceback of TorchScript, original code (most recent call last):
  File "/alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2/ASR1/pruned_transducer_stateless5/joiner.py", line 55, in forward
          Return a tensor of shape (N, T, s_range, C).
        """
        assert encoder_out.ndim == decoder_out.ndim == 4
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        assert encoder_out.shape[:-1] == decoder_out.shape[:-1]

RuntimeError: AssertionError: 

Traceback (most recent call last):
  File "./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py", line 742, in <module>
    main()
  File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py", line 716, in main
    asyncio.run(server.run(port))
  File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/asyncio/base_events.py", line 603, in run_until_complete
    self.run_forever()
  File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
    self._run_once()
  File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/asyncio/base_events.py", line 1823, in _run_once
    event_list = self._selector.select(timeout)
  File "/speech/toolkits/espnet/tools/anaconda/envs/k2/lib/python3.8/selectors.py", line 468, in select
    fd_event_list = self._selector.poll(timeout, max_ev)
KeyboardInterrupt
csukuangfj commented 1 year ago

Could you please delete the assert statement, re-export your model and try again?

AmirHussein96 commented 1 year ago

@csukuangfj can you please elaborate which assert statement you mean?

csukuangfj commented 1 year ago

Please see the above error message you just posted.

It is from joiner.py

AmirHussein96 commented 1 year ago

Thank you it is working now. I recorded small demo using my mic here check it https://youtu.be/2uh3zVAFyQ4

csukuangfj commented 1 year ago

Thanks! I will update the documentation to add a link to the video.

Would you mind also recording a video showing how endpointing works? (You only need to pause for a while, e.g., 2 seconds, before you say the next sentence.)

AmirHussein96 commented 1 year ago

Thanks @csukuangfj , here is the video with pauses https://youtu.be/t2SlrzgMd_k. BTW I noticed that the streaming pruned stateless transducer model trained on MGB-2 is very robust to noise and background music but when I switch to English the model totally ignores my speech, It only transcribes when I speak in Modern Standard Arabic. Comparing this to Espnet blockwise streaming transformer/conformer trained on MGB-2, which is more sensitive to noise and music but it also transcribes everything that I say even when I switch completely to English it still uses Arabic scripts to transcribe it. Is there any reason why the streaming pruned stateless transducer ignores my English speech? How can I make it more sensitive to any words being said?

This is the command I am using: ./sherpa/bin/streaming_pruned_transducer_statelessX/streaming_server.py --decoding-method fast_beam_search --decode-left-context 32 --decode-chunk-size 16 --lang-dir data/lang_bpe_2000 --endpoint.rule3.min-utterance-length 1000.0 --port 6006 --max-batch-size 50 --max-wait-ms 5 --nn-pool-size 1 --nn-model-filename ./mgb2/exp/cpu_jit.pt --bpe-model-filename ./mgb2/data/lang_bpe_2000/bpe.model

csukuangfj commented 1 year ago

here is the video with pauses https://youtu.be/t2SlrzgMd_k.

Thanks!


Is there any reason why the streaming pruned stateless transducer ignores my English speech? How can I make it more sensitive to any words being said?

Sorry, I don't have any suggestions for that.

csukuangfj commented 1 year ago

By the way, the Arabic endpointing demo has been added to

https://k2-fsa.github.io/sherpa/python/streaming_asr/endpointing.html#endpointing-demo-arabic

Screenshot 2023-01-02 at 20 36 12

AmirHussein96 commented 1 year ago

Awesome thank you @csukuangfj