k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.73k stars 432 forks source link

Linux下使用流式zipformer指定配置热词失败 #1095

Open renshujiajia opened 4 months ago

renshujiajia commented 4 months ago

使用c-api-demo编译得到decode-file-c-api进行文件读取和热词处理,提示: Cannot find ID for token THE at line: THE. (Hint: words on the same line are separated by spaces)、405 Failed to encode some hotwords, skip them already, see logs above for details.执行命令及结果如下:

 ./decode-file-c-api --encoder=sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx \
--decoder=sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx \
--joiner=sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx \
--decoding-method=modified_beam_search \
--tokens=sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt \
--modeling-unite=bpe \
--bpe-vocab=sherpa-onnx-streaming-zipformer-en-2023-06-26/bpe.vocab \
--hotwords=sherpa-onnx-streaming-zipformer-en-2023-06-26/hotwords.txt \
test_waves/0.wav 
# output
/opt/data/private/restore/root/rensj/PROJECT/ASR/sherpa-onnx-develop/sherpa-onnx/csrc/utils.cc:EncodeBase:64 Cannot find ID for token THE at line: THE. (Hint: words on the same line are separated by spaces)
/opt/data/private/restore/root/rensj/PROJECT/ASR/sherpa-onnx-develop/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:InitHotwords:405 Failed to encode some hotwords, skip them already, see logs above for details.
sample rate: 16000, num samples: 106000, duration: 6.62 s
0: AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT
  UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
pkufool commented 4 months ago

请把模型链接和热词(hotwords.txt 内容)提供一下。