Closed rocon12933-arch closed 1 month ago
请贴启动 server 时的 log.
请贴启动 server 时的 log.
Attaching to sherpa-onnx-offline sherpa-onnx-offline | 2024-09-26 07:03:05,643 INFO [non_streaming_server.py:1001] {'encoder': '', 'decoder': '', 'joiner': '', 'paraformer': '', 'sense_voice': '/app/models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.int8.onnx', 'nemo_ctc': '', 'wenet_ctc': '', 'tdnn_model': '', 'whisper_encoder': '', 'whisper_decoder': '', 'whisper_language': '', 'whisper_task': 'transcribe', 'whisper_tail_paddings': -1, 'tokens': '/app/models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt', 'num_threads': 2, 'provider': 'cpu', 'sample_rate': 16000, 'feat_dim': 80, 'decoding_method': 'greedy_search', 'max_active_paths': 4, 'hotwords_file': '', 'hotwords_score': 1.5, 'blank_penalty': 0.0, 'port': 6006, 'max_batch_size': 3, 'max_wait_ms': 5, 'nn_pool_size': 1, 'max_message_size': 1048576, 'max_queue_size': 32, 'max_active_connections': 200, 'certificate': None, 'doc_root': '/app/sherpa-onnx/python-api-examples/web'} sherpa-onnx-offline | 2024-09-26 07:03:06,813 INFO [non_streaming_server.py:647] started sherpa-onnx-offline | 2024-09-26 07:03:06,814 INFO [non_streaming_server.py:659] No certificate provided sherpa-onnx-offline | 2024-09-26 07:03:06,814 INFO [server.py:715] server listening on [::]:6006 sherpa-onnx-offline | 2024-09-26 07:03:06,814 INFO [server.py:715] server listening on 0.0.0.0:6006
log 中,哪里看出来用了 itn?
log 中,哪里看出来用了 itn?
non_streaming_server.py默认的是use_itn=True的吧,而且普通话的itn也确实生效了。使用sherpa-onnx-offline-websocket-server加入--sense-voice-use-itn=true,效果也是一样的,普通话的相应的有itn,粤语的响应只有汉字
sherpa-onnx-offline | /home/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 /usr/local/bin/sherpa-onnx-offline-websocket-server --tokens=/app/models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/tokens.txt --sense-voice-model=/app/models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17/model.onnx --sense-voice-use-itn=true --num-io-threads=4 --num-work-threads=8 sherpa-onnx-offline | sherpa-onnx-offline | /home/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/offline-websocket-server.cc:main:91 Started! sherpa-onnx-offline | /home/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/offline-websocket-server.cc:main:92 Listening on: 6006 sherpa-onnx-offline | /home/runner/work/sherpa-onnx/sherpa-onnx/sherpa-onnx/csrc/offline-websocket-server.cc:main:93 Number of work threads: 8
那这个应该是 sense voice 模型本身的问题。你用官方提供的 modelscope 上的 space试一试? https://www.modelscope.cn/studios/iic/SenseVoice
也可以用 fuansr 提供的脚本跑一下。
最后,还有一个解救的办法,你可以用 我们自己的 itn, 和模型无关。 请搜索例子里的 rule_fsts (asr 例子)
jni和python-api-examples都是一样的,但是普通话就可以itn