wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.14k stars 1.07k forks source link

grpc server disk presure #1480

Closed oshindow closed 2 years ago

oshindow commented 2 years ago

Hi! I'm using the onnx model plus TLG.fst and grpc server in wenet.

The wenet grpc server hangs and grpc client print Recognize rpc failed because of disk pressure (16-cores cpu, wokers=16), when I started 5 grpc client and sent audio continuously at the same time.

Note that I have commented out the server-side decoding logs (keeping the logs for loading the model) and this error occurs after sending about 3000 audios/client.

  1. What could be the reason for it?
  2. Are there any other io operations besides the log part?
  3. If the batch decode was implement on grpc service, will it solve this problem?
oshindow commented 2 years ago

The log is: I1005 08:38:02.271981 8 params.h:125] Reading onnx model I1005 08:38:03.402464 8 onnx_asr_model.cc:130] Onnx Model Info: I1005 08:38:03.402518 8 onnx_asr_model.cc:131] encoder_output_size 512 I1005 08:38:03.402524 8 onnx_asr_model.cc:132] num_blocks 12 I1005 08:38:03.402534 8 onnx_asr_model.cc:133] head 8 I1005 08:38:03.402537 8 onnx_asr_model.cc:134] cnn_module_kernel 15 I1005 08:38:03.402545 8 onnx_asr_model.cc:135] subsampling_rate 4 I1005 08:38:03.402551 8 onnx_asr_model.cc:136] right_context 6 I1005 08:38:03.402557 8 onnx_asr_model.cc:137] sos 5537 I1005 08:38:03.402562 8 onnx_asr_model.cc:138] eos 5537 I1005 08:38:03.402570 8 onnx_asr_model.cc:139] is bidirectional decoder 1 I1005 08:38:03.402583 8 onnx_asr_model.cc:140] chunk_size -1 I1005 08:38:03.402593 8 onnx_asr_model.cc:141] num_left_chunks -1 I1005 08:38:03.402606 8 onnx_asr_model.cc:144] Onnx Encoder: I1005 08:38:03.402632 8 onnx_asr_model.cc:55] Input 0 : name=chunk type=1 dims=1 -1 80 I1005 08:38:03.402644 8 onnx_asr_model.cc:55] Input 1 : name=offset type=7 dims= I1005 08:38:03.402663 8 onnx_asr_model.cc:55] Input 2 : name=att_cache type=1 dims=12 8 -1 128 I1005 08:38:03.402675 8 onnx_asr_model.cc:55] Input 3 : name=cnn_cache type=1 dims=12 1 512 14 I1005 08:38:03.402689 8 onnx_asr_model.cc:73] Output 0 : name=output type=1 dims=-1 -1 -1 I1005 08:38:03.402698 8 onnx_asr_model.cc:73] Output 1 : name=r_att_cache type=1 dims=12 8 -1 128 I1005 08:38:03.402710 8 onnx_asr_model.cc:73] Output 2 : name=r_cnn_cache type=1 dims=12 1 512 -1 I1005 08:38:03.402719 8 onnx_asr_model.cc:146] Onnx CTC: I1005 08:38:03.402734 8 onnx_asr_model.cc:55] Input 0 : name=hidden type=1 dims=1 -1 512 I1005 08:38:03.402748 8 onnx_asr_model.cc:73] Output 0 : name=probs type=1 dims=1 -1 5538 I1005 08:38:03.402755 8 onnx_asr_model.cc:148] Onnx Rescore: I1005 08:38:03.402768 8 onnx_asr_model.cc:55] Input 0 : name=hyps type=7 dims=-1 -1 I1005 08:38:03.402777 8 onnx_asr_model.cc:55] Input 1 : name=hyps_lens type=7 dims=-1 I1005 08:38:03.402788 8 onnx_asr_model.cc:55] Input 2 : name=encoder_out type=1 dims=1 -1 512 I1005 08:38:03.402797 8 onnx_asr_model.cc:73] Output 0 : name=score type=1 dims=-1 -1 5538 I1005 08:38:03.402812 8 onnx_asr_model.cc:73] Output 1 : name=r_score type=1 dims=-1 -1 5538 I1005 08:38:03.402824 8 params.h:145] Reading unit table /home/20220506_u2pp_conformer_libtorch/units.txt I1005 08:38:03.409274 8 params.h:153] Reading fst /home/20220506_u2pp_conformer_libtorch/TLG.fst I1005 08:38:03.478689 8 fst.h:799] FstImpl::ReadHeader: source: /home/20220506_u2pp_conformer_libtorch/TLG.fst, fst_type: vector, arc_type: standard, version: 2, flags: 0 I1005 08:40:45.654079 8 params.h:159] Reading symbol table /home/20220506_u2pp_conformer_libtorch/words.txt I1005 08:40:47.768419 8 grpc_server_main.cc:47] Listening at port 10086 F1005 08:46:34.691398 3133 grpc_server.cc:52] Check failed: featurepipeline != nullptr Check failure stack trace: @ 0x7f1629220068 google::LogMessage::Fail() @ 0x7f162921ffb0 google::LogMessage::SendToLog() @ 0x7f162921f8f1 google::LogMessage::Flush() @ 0x7f16292231a2 google::LogMessageFatal::~LogMessageFatal() @ 0x5647c78643f5 wenet::GrpcConnectionHandler::OnSpeechEnd() @ 0x5647c7864bc9 wenet::GrpcConnectionHandler::operator()() @ 0x5647c786a559 std::__invoke_impl<>() @ 0x5647c7868fc5 std::invoke<>() @ 0x5647c786fcde _ZNSt6thread8_InvokerISt5tupleIJN5wenet21GrpcConnectionHandlerEEEE9_M_invokeIJLm0EEEEDTcl8invokespcl10_S_declvalIXT_EEEEESt12_Index_tupleIJXspT_EEE @ 0x5647c786f4ac std::thread::_Invoker<>::operator()() @ 0x5647c786ed56 std::thread::_State_impl<>::_M_run() @ 0x7f1627ada6df (unknown) @ 0x7f16295086db start_thread @ 0x7f162719761f clone @ (nil) (unknown) /home/wenet/runtime/onnxruntime/grpc_server.sh: line 94: 8 Aborted (core dumped) /home/wenet/runtime/onnxruntime/build/bin/grpc_server_main --port $port --workers $nj --chunk_size $chunk_size --ctc_weight $ctc_weight --rescoring_weight $rescoring_weight $wfst_decode_opts --dict_path $model_dir/words.txt --onnx_dir $onnx_dir --unit_path $model_dir/units.txt

oshindow commented 2 years ago

Incorrect sample rate wav file in testset caused this problem. I have solved it. Thank you!