modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.95k stars 739 forks source link

RuntimeError: "round_cuda" not implemented for 'Long' #2189

Closed lukeewin closed 1 week ago

lukeewin commented 1 week ago

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

推理时报错RuntimeError: "round_cuda" not implemented for 'Long'

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

推理时报错RuntimeError: "round_cuda" not implemented for 'Long'。请问这个应该如何解决?有大佬遇到过一样的问题吗?你们是如何解决的?目前在windows 11平台中运行中,在Ubuntu22.04中推理时报错RuntimeError: "round_cuda" not implemented for 'Long'

Code

What have you tried?

在windows中正常推理,换到Ubuntu22.04中推理报错RuntimeError: "round_cuda" not implemented for 'Long'

What's your code?

import argparse
import soundfile
import os

from funasr import AutoModel

parser = argparse.ArgumentParser()
parser.add_argument("--asr_model_online_revision", type=str, default="v2.0.4", help="")
parser.add_argument(
    "--asr_model_online",
    type=str,
    default="/root/autodl-tmp/funasr/FunASR/model",
    help="model from modelscope",
)
parser.add_argument("--ngpu", type=int, default=1, help="0 for cpu, 1 for gpu")
parser.add_argument("--device", type=str, default="cuda", help="cuda, cpu")
parser.add_argument("--ncpu", type=int, default=4, help="cpu cores")
args = parser.parse_args()

model = AutoModel(
    model=args.asr_model_online,
    model_revision=args.asr_model_online_revision,
    ngpu=args.ngpu,
    ncpu=args.ncpu,
    device=args.device,
    disable_pbar=True,
    disable_log=True,
    disable_update=True
)

def infer1(scp_file: str):
    final_result_list = []
    with open(scp_file, 'r', encoding='utf-8') as f:
        line = f.readline()
        wav_file = line.split(' ')[1].strip()
        chunk_size = [0, 10, 5]  # [0, 10, 5] 600ms, [0, 8, 4] 480ms
        encoder_chunk_look_back = 4  # number of chunks to lookback for encoder self-attention
        decoder_chunk_look_back = 1  # number of encoder chunks to lookback for decoder cross-attention
        speech, sample_rate = soundfile.read(wav_file)
        chunk_stride = chunk_size[1] * 960  # 600ms

        res_txt = []
        cache = {}
        total_chunk_num = int(len((speech) - 1) / chunk_stride + 1)
        for i in range(total_chunk_num):
            speech_chunk = speech[i * chunk_stride:(i + 1) * chunk_stride]
            is_final = i == total_chunk_num - 1
            res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size,
                                 encoder_chunk_look_back=encoder_chunk_look_back,
                                 decoder_chunk_look_back=decoder_chunk_look_back)
            # print(res[0]['text'])
            res_txt.append(res[0]['text'])
        final_res = ''.join(res_txt)
        if final_res.strip() == '' or final_res is None:
            final_res = '*'
        print(final_res)
        final_result_list.append(final_res)
    return final_result_list

# 批量推理
# wav_file_dir: 指定要推理的路径
# return: 返回推理后的结果集合
def infer(wav_file_dir: str):
    final_result_list = []
    chunk_size = [0, 10, 5]  # [0, 10, 5] 600ms, [0, 8, 4] 480ms
    encoder_chunk_look_back = 4  # number of chunks to lookback for encoder self-attention
    decoder_chunk_look_back = 1  # number of encoder chunks to lookback for decoder cross-attention
    # 判断目录是否存在
    if os.path.isdir(wav_file_dir):
        # 如果存在则遍历该目录下的全部文件
        for root, dirs, files in os.walk(wav_file_dir):
            for file in files:
                wav_file = os.path.join(root, file)

                # wav_file = os.path.join(model.model_path, "example/asr_example.wav")
                speech, sample_rate = soundfile.read(wav_file)
                chunk_stride = chunk_size[1] * 960  # 600ms

                res_txt = []
                cache = {}
                total_chunk_num = int(len((speech) - 1) / chunk_stride + 1)
                for i in range(total_chunk_num):
                    speech_chunk = speech[i * chunk_stride:(i + 1) * chunk_stride]
                    is_final = i == total_chunk_num - 1
                    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size,
                                         encoder_chunk_look_back=encoder_chunk_look_back,
                                         decoder_chunk_look_back=decoder_chunk_look_back)
                    # print(res[0]['text'])
                    res_txt.append(res[0]['text'])
                final_res = ''.join(res_txt)
                print(final_res)
                final_result_list.append(final_res)
    return final_result_list
LauraGPT commented 1 week ago

We suggest torch>=1.13