uniASR模型推理速度慢

MyWestCity commented 9 months ago

运行环境：操作系统：linux python：3.8.16 modelscope:1.9.4 funasr: 0.8.4 gpu:T4 cuda:11.6

代码

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope.utils.logger import get_logger
import time
import logging
logger = get_logger(log_level=logging.CRITICAL)
logger.setLevel(logging.CRITICAL)

import soundfile

waveform, sample_rate = soundfile.read("/workspace/test_minnan.wav")

inference_pipeline_vad = pipeline(
    task=Tasks.voice_activity_detection,
    model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch',
    model_revision=None,
)

inference_pipeline = pipeline(
    task=Tasks.auto_speech_recognition,
    model='damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825')

inference_pipeline_punc = pipeline(
    task=Tasks.punctuation,
    model='damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727',
    model_revision=None,
)

segments_result = inference_pipeline_vad(audio_in=waveform)
param_punc_dict = {"cache": []}
start = time.time()
for i, segments in enumerate(segments_result["text"]):
    beg_idx = segments[0] * sample_rate/1000
    end_idx = segments[1] * sample_rate/1000
    waveform_slice = waveform[int(beg_idx):int(end_idx)]
    result_segments = inference_pipeline(audio_in=waveform_slice)
    if result_segments != []:
        result_segments_withpunc = inference_pipeline_punc(text_in=result_segments['text'], 
 param_dict=param_punc_dict)
        print(result_segments_withpunc['text'])
    else:
        print(result_segments)
end = time.time()
print(f'耗时: {end - start}')

输出： 2023-12-12 06:56:59,345 - modelscope - INFO - PyTorch version 1.13.1+cu116 Found. 2023-12-12 06:56:59,346 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-12-12 06:56:59,386 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 1a88e733bc38da6d1c3ef3b5df8d7f1d and a total number of 945 components indexed Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.72M/2.72M [00:00<00:00, 16.9MB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 469/469 [00:00<00:00, 118kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 275M/275M [00:23<00:00, 12.2MB/s] Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.72M/2.72M [00:00<00:00, 14.6MB/s] Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.53k/7.53k [00:00<00:00, 2.66MB/s] [##################################################] 100%可以啊，但是你就爱食的，就你的萝卜肚无理由啊，直到遐鱼啊，大汉食袂落药就无效啊 [##################################################] 100%。会啦，你做饭时阮肯定食会累的这食好料啊，阮肯定食会累 [##################################################] 100%。会晓平常时，你像你讲，阮著是跟着你到科尔讲厝里外面食食哪无仝的，毕竟食堂的菜肯定是较歹料吧，你去登 [##################################################] 100%食完了食堂的毋讲食到偌好吧，因为阮遐大家拢按呢讲按怎讲呢？讲失当的就是食的饱，你还想欲食的 [##################################################] 100%好的活，基本上是无啦。主要就是食会饱还是顿时讲好食好食较快无伫迄位啊内面啊，尔啊喙口啊食了阁惊倒出来 [##################################################] 100%，有时候人多阁想多一些麻烦烦也毋要找食堂啊，较方便，较急，应该怎样变作共电机拢正食堂啊，安尼外口啊 [##################################################] 100%，阮离你呃，办公室的省份嘛有好好食的 [##################################################] 100%，也是还有会使吧，大门口出去，我倒酒瓶走啊，是袂讲远到按怎啦走路过去嘛，主要是讲现在热天感觉太热了 [##################################################] 100%，有一次走到厝边的机子才才好好走落进来耗时: 21.66886305809021

问题：音频时长71s，但是推理时长需要21s，请问这个正常么，相比paraformer的模型速度慢太多了

qinghuanyyz commented 4 months ago

你好，有看到你用过这个模型。现在无法使用pipline方法使用damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825这个模型，请问你有遇到吗

MyWestCity commented 4 months ago

你好，有看到你用过这个模型。现在无法使用pipline方法使用damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825这个模型，请问你有遇到吗

不是不能用，你试试换一个版本，modelscope=1.5.2 funasr=0.4.4，这个版本我试了是能推理的

qinghuanyyz commented 4 months ago

你好，有看到你用过这个模型。现在无法使用pipline方法使用damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825这个模型，请问你有遇到吗

不是不能用，你试试换一个版本，modelscope=1.5.2 funasr=0.4.4，这个版本我试了是能推理的

非常感谢！旧版本就可以推理了

mengyi11452 commented 4 months ago

一样的代码和版本，为什么一直报错，有没有好大哥大姐看一下，哪里的版本还有问题 KeyError: 'funasr-pipeline is not in the pipelines registry group voice-activity-detection. Please make sure the correct version of ModelScope library is used.'

modelscope / FunASR

uniASR模型推理速度慢 #1172