批量识别多个音频的方法

yeyupiaoling / PPASR

基于PaddlePaddle实现端到端中文语音识别，从入门到实战，超简单的入门案例，超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型

Apache License 2.0

797 stars 131 forks source link

批量识别多个音频的方法 #130

Closed AlexandaJerry closed 1 year ago

AlexandaJerry commented 1 year ago

微信图片_20221221103215

您好，我想用PPASR项目批量识别文件夹内的多个音频，图中是我想到的实现方法。但是我发现这种方法会导致，每次识别单条音频时都会导入一次模型和初始化一次解码器，感觉影响到了整体的运行速度，不知道有没有方法可以提升多个音频的批量识别效率。

飘零哥也可以告诉我在哪个部分改动代码可以实现我的需求，我可以自己进行研究和修改，感激不尽！（目前是24分钟的音频识别完成大概需要12分钟）

yeyupiaoling commented 1 year ago

参考infer_path.py中的代码，代码片段如下：

# 获取识别器
predictor = PPASRPredictor(configs=configs,
                           model_path=args.model_path.format(configs['use_model'], configs['preprocess_conf']['feature_method']),
                           use_gpu=args.use_gpu,
                           use_pun=args.use_pun,
                           pun_model_dir=args.pun_model_dir)

files = os.listdir('sliced_wav/')
for file in files:
    wav_path= os.path.jion('sliced_wav/', file)
    result = predictor.predict(audio_data=wav_path)

AlexandaJerry commented 1 year ago

非常感谢您的回复！还有个小问题就是我发现识别过程中的CPU和GPU利用率较小，不知道这会不会影响识别的速度。我有没有方法可以增加CPU和GPU的利用率，来为语音识别的过程提速呢？

yeyupiaoling commented 1 year ago

单个推理是利用率是很小的，本来推理的时间段就不长。你也可以参考eval的一批进行推理。

AlexandaJerry commented 1 year ago

感谢飘零哥的答复，PPASR的识别准确率真的很棒。我没有其余问题了