k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.34k stars 391 forks source link

关键词识别模型加载时间需要30秒!如何优化 ? Keyword spotter model loading time takes 30 seconds! How to optimize? #1237

Closed shaojianglee closed 2 months ago

shaojianglee commented 2 months ago

关键词识别模型加载时间需要30秒!如何优化 ? Keyword spotter model loading time takes 30 seconds! How to optimize?

`/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/c-api/c-api.cc:SherpaOnnxCreateKeywordSpotter:637 KeywordSpotterConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="/home/suser/test/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.onnx", decoder="/home/suser/test/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.onnx", joiner="/home/suser/test/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.onnx"), paraformer=OnlineParaformerModelConfig(encoder="", decoder=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), nemo_ctc=OnlineNeMoCtcModelConfig(model=""), provider_config=ProviderConfig(device=0, provider="cpu", cuda_config=CudaConfig(cudnn_conv_algo_search=1), trt_config=TensorrtConfig(trt_max_workspace_size=2147483647, trt_max_partition_iterations=10, trt_min_subgraph_size=5, trt_fp16_enable="True", trt_detailed_build_log="False", trt_engine_cache_enable="True", trt_engine_cache_path=".", trt_timing_cache_enable="True", trt_timing_cache_path=".",trt_dump_subgraphs="False" )), tokens="/home/suser/test/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/tokens.txt", num_threads=4, warm_up=0, debug=True, model_type="", modeling_unit="cjkchar", bpe_vocab=""), max_active_paths=4, num_trailing_blanks=1, keywords_score=1, keywords_threshold=0.25, keywords_file="/home/suser/test/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/keywords.txt")

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-transducer-model.cc:GetModelType:52 num_heads=4,4,4,8,4,4 cnn_module_kernels=31,31,15,15,15,31 encoder_dims=128,128,128,128,128,128 query_head_dims=32,32,32,32,32,32 T=45 value_head_dims=12,12,12,12,12,12 decode_chunk_len=32 left_context_len=64,32,16,8,16,32 num_encoder_layers=1,1,1,1,1,1 comment=streaming zipformer2 version=1 model_author=k2-fsa model_type=zipformer2

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:InitEncoder:100 ---encoder--- num_heads=4,4,4,8,4,4 cnn_module_kernels=31,31,15,15,15,31 encoder_dims=128,128,128,128,128,128 query_head_dims=32,32,32,32,32,32 T=45 value_head_dims=12,12,12,12,12,12 decode_chunk_len=32 left_context_len=64,32,16,8,16,32 num_encoder_layers=1,1,1,1,1,1 comment=streaming zipformer2 version=1 model_author=k2-fsa model_type=zipformer2

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:operator():122 encoder_dims: 128 128 128 128 128 128

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:operator():122 query_head_dims: 32 32 32 32 32 32

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:operator():122 value_head_dims: 12 12 12 12 12 12

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:operator():122 num_heads: 4 4 4 8 4 4

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:operator():122 num_encoder_layers: 1 1 1 1 1 1

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:operator():122 cnn_module_kernels: 31 31 15 15 15 31

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:operator():122 left_context_len: 64 32 16 8 16 32

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:InitEncoder:131 T: 45 /home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:InitEncoder:132 decode_chunklen: 32 /home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:InitDecoder:153 ---decoder--- vocab_size=197 context_size=2

/home/suser/sherpa-onnx-1.10.20/sherpa-onnx/csrc/online-zipformer2-transducer-model.cc:InitJoiner:178 ---joiner--- joiner_dim=320

Current sample rate: 16000 Recording started! Use recording device: plughw:2,0`

csukuangfj commented 2 months ago

请描述你的运行环境。

目前你是第一个有这个问题的同学。