A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
from funasr_onnx import Paraformer
from pathlib import Path
model_dir = "damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
# model_dir = "damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
model = Paraformer(model_dir, batch_size=1, quantize=False)
# model = Paraformer(model_dir, batch_size=1, device_id=0) # gpu
# when using paraformer-large-vad-punc model, you can set plot_timestamp_to="./xx.png" to get figure of alignment besides timestamps
# model = Paraformer(model_dir, batch_size=1, plot_timestamp_to="test.png")
wav_path = ['/work/lxy/audio/segment_0-120.wav']
result = model(wav_path)
print(result)
onnx offline demo在gpu上运行非常慢,比cpu慢了差不多5倍
运行pytorch的demo,gpu比cpu快,很正常,但是 运行onnx的demo,device=0推理一段120s的音频,比在cpu上慢了5倍 运行c++版本的也是一样,甚至比python的demo还慢
Code
What's your environment?
pip
, source): pip