modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
5.92k stars 644 forks source link

python fastapi进行离线文件识别报错choose a window size 400 that is [2, 0] #2005

Open secslim opened 1 month ago

secslim commented 1 month ago

🐛 Bug

使用FunASR/runtime/python/http/server.py文件进行离线文件识别, 服务端使用两个进程 uvicorn.run( app="fun_test:app", host=args.host, port=args.port, ssl_keyfile=args.keyfile, ssl_certfile=args.certfile, workers=2 ) 当两个用户同时访问,报如下错误 choose a window size 400 that is [2, 0]

Environment

nixonjin commented 3 weeks ago

同样遇到这个错误,assert 2 <= window_size <= len(waveform), "choose a window size {} that is [2, {}]".format( AssertionError: choose a window size 400 that is [2, 160] Environment OS : Linux FunASR Version : 1.1.6 PyTorch Version : 2.3.1

nixonjin commented 3 weeks ago

单进程没有这个错误,使用多进程遇到这个错误,有大佬知道大概是哪里出现问题吗?源码要修改哪里?

nixonjin commented 2 weeks ago

fix bug: 方法一 升级funasr到1.1.6版本

  1. 找到/funasr/fronteds/wav_fronted.py/WavFonted 类的forward 方法
  2. 转到第137行,在调用kaldi.fbank方法中,修改代码为:

      mat = kaldi.fbank(
            waveform,
            num_mel_bins=self.n_mels,
            frame_length=min(self.frame_length,waveform_length/self.fs*1000),
            frame_shift=self.frame_shift,
            dither=self.dither,
            energy_floor=0.0,
            window_type=self.window,
            sample_frequency=self.fs,
            snip_edges=self.snip_edges,
        )

在我的场景中此法有效

方法二: MyWavFrontend 或者,如果不想直接修改funasr包代码,可以新建一个MYWavFrontend类 ` @tables.register("frontend_classes", "my_wav_frontend") @tables.register("frontend_classes", "MyWavFrontend") class MYWavFrontend(WavFrontend): """Conventional frontend structure for ASR."""

def forward(
    self,
    input: torch.Tensor,
    input_lengths,
    **kwargs,
) -> Tuple[torch.Tensor, torch.Tensor]:
    batch_size = input.size(0)
    feats = []
    feats_lens = []
    for i in range(batch_size):
        waveform_length = input_lengths[i]
        waveform = input[i][:waveform_length]
        if self.upsacle_samples:
            waveform = waveform * (1 << 15)
        waveform = waveform.unsqueeze(0)
        mat = kaldi.fbank(
            waveform,
            num_mel_bins=self.n_mels,
            # frame_length=self.frame_length,
            frame_length=min(self.frame_length,waveform_length/self.fs*1000),
            frame_shift=self.frame_shift,
            dither=self.dither,
            energy_floor=0.0,
            window_type=self.window,
            sample_frequency=self.fs,
            snip_edges=self.snip_edges,
        )

        if self.lfr_m != 1 or self.lfr_n != 1:
            mat = apply_lfr(mat, self.lfr_m, self.lfr_n)
        if self.cmvn is not None:
            mat = apply_cmvn(mat, self.cmvn)
        feat_length = mat.size(0)
        feats.append(mat)
        feats_lens.append(feat_length)

    feats_lens = torch.as_tensor(feats_lens)
    if batch_size == 1:
        feats_pad = feats[0][None, :, :]
    else:
        feats_pad = pad_sequence(feats, batch_first=True, padding_value=0.0)
    return feats_pad, feats_lens`

并在模型配置文件speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/config.yaml 中修改WavFrontend为MyWavFrontend

secslim commented 2 weeks ago

谢谢大佬,我试试看

发自我的iPhone

------------------ 原始邮件 ------------------ 发件人: Nixon @.> 发送时间: 2024年8月27日 18:05 收件人: modelscope/FunASR @.> 抄送: secslim @.>, Author @.> 主题: Re: [modelscope/FunASR] python fastapi进行离线文件识别报错choose a window size 400 that is [2, 0] (Issue #2005)

fix bug: 方法一 升级funasr到1.1.6版本

找到/funasr/fronteds/wav_fronted.py/WavFonted 类的forward 方法

转到第137行,在调用kaldi.fbank方法中,修改代码为: mat = kaldi.fbank( waveform, num_mel_bins=self.n_mels, # frame_length=self.frame_length, frame_length=min(self.frame_length,waveform_length/self.fs*1000), frame_shift=self.frame_shift, dither=self.dither, energy_floor=0.0, window_type=self.window, sample_frequency=self.fs, snip_edges=self.snip_edges, ) 在我的场景中此法有效

方法二: MyWavFrontend 或者,如果不想直接修改funasr包代码,可以新建一个MYWavFrontend类 @tables.register("frontend_classes", "my_wav_frontend") @tables.register("frontend_classes", "MyWavFrontend") class MYWavFrontend(WavFrontend): """Conventional frontend structure for ASR.""" def forward( self, input: torch.Tensor, input_lengths, **kwargs, ) -&gt; Tuple[torch.Tensor, torch.Tensor]: batch_size = input.size(0) feats = [] feats_lens = [] for i in range(batch_size): waveform_length = input_lengths[i] waveform = input[i][:waveform_length] if self.upsacle_samples: waveform = waveform * (1 << 15) waveform = waveform.unsqueeze(0) mat = kaldi.fbank( waveform, num_mel_bins=self.n_mels, # frame_length=self.frame_length, frame_length=min(self.frame_length,waveform_length/self.fs*1000), frame_shift=self.frame_shift, dither=self.dither, energy_floor=0.0, window_type=self.window, sample_frequency=self.fs, snip_edges=self.snip_edges, ) if self.lfr_m != 1 or self.lfr_n != 1: mat = apply_lfr(mat, self.lfr_m, self.lfr_n) if self.cmvn is not None: mat = apply_cmvn(mat, self.cmvn) feat_length = mat.size(0) feats.append(mat) feats_lens.append(feat_length) feats_lens = torch.as_tensor(feats_lens) if batch_size == 1: feats_pad = feats[0][None, :, :] else: feats_pad = pad_sequence(feats, batch_first=True, padding_value=0.0) return feats_pad, feats_lens
并在模型配置文件speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/config.yaml 中修改WavFrontend为MyWavFrontend

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>