modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
5.95k stars 644 forks source link

使用“SOND说话人日志-中文-alimeeting-16k-离线”没有得到音频输出 #611

Closed Remember2015 closed 1 year ago

Remember2015 commented 1 year ago

runtime

funasr version

model

code

from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
import numpy as np

# 初始化推理 pipeline
# 当以原始音频作为输入时使用配置文件 sond.yaml,并设置 mode 为sond_demo
inference_diar_pipline = pipeline(
    mode="sond_demo",
    num_workers=0,
    task=Tasks.speaker_diarization,
    diar_model_config="sond.yaml",
    model='damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch',
    model_revision="v1.0.5",
    output_dir='./output',
    sv_model="damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch",
    sv_model_revision="v1.2.2",
)

# 以 audio_list 作为输入,其中第一个音频为待检测语音,后面的音频为不同说话人的声纹注册语音
audio_list=[
"./2-16000.wav",
"./2-person-16000.wav",
]

results = inference_diar_pipline(audio_in=audio_list)
print(results)

output

2023-06-09 11:20:34,255 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.
2023-06-09 11:20:34,260 - modelscope - INFO - Loading ast index from /mnt/workspace/.cache/modelscope/ast_indexer
2023-06-09 11:20:34,297 - modelscope - INFO - Loading done! Current index file version is 1.6.1, with md5 c661f1c586a773fd9e04a6031d0d6d1e and a total number of 849 components indexed
2023-06-09 11:20:35,745 - modelscope - INFO - Use user-specified model revision: v1.0.5
2023-06-09 11:20:36,044 - modelscope - INFO - initiate model from /mnt/workspace/.cache/modelscope/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch
2023-06-09 11:20:36,044 - modelscope - INFO - initiate model from location /mnt/workspace/.cache/modelscope/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch.
2023-06-09 11:20:36,045 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch
2023-06-09 11:20:36,048 - modelscope - WARNING - No preprocessor field found in cfg.
2023-06-09 11:20:36,048 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2023-06-09 11:20:36,048 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/mnt/workspace/.cache/modelscope/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch'}. trying to build by task and model information.
2023-06-09 11:20:36,048 - modelscope - WARNING - No preprocessor key ('generic-sv', 'speaker-diarization') found in PREPROCESSOR_MAP, skip building preprocessor.
2023-06-09 11:20:36,490 - modelscope - INFO - Use user-specified model revision: v1.2.2
2023-06-09 11:20:36,786 - modelscope - INFO - loading speaker verification model from /mnt/workspace/.cache/modelscope/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch ...
2023-06-09 11:20:44,023 - modelscope - INFO - Speaker Diarization Processing: ['./2-16000.wav', './2-person-16000.wav'] ...
2023-06-09 11:20:44,023 (speaker_diarization_pipeline:234) INFO: Speaker Diarization Processing: ['./2-16000.wav', './2-person-16000.wav'] ...
/root/FunASR/funasr/models/encoder/resnet34_encoder.py:56: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  ilens = (ilens + 1) // self.stride
/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3704: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
{'text': 'spk1 [(0.0, 70.32)]'}

question

按照 官方示例 运行后,并未得到分离后的音频,然后特意设置 output_dir 也没有效果,不确定是什么原因,也没有报错

lyblsgo commented 1 year ago

This is a speech diarization model, model output is timestamp, not a speech separation model. Maybe https://www.modelscope.cn/models/damo/speech_mossformer_separation_temporal_8k/summary can help.