modelscope / FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
https://www.funasr.com
Other
6.15k stars 657 forks source link

对auto_model,py中代码逻辑的不理解 #1827

Closed zzk2021 closed 3 months ago

zzk2021 commented 3 months ago
 for j, _ in enumerate(range(0, n)):
                sample_length = sorted_data[j][0][1] - sorted_data[j][0][0]
                potential_batch_length = max(max_len_in_batch, sample_length) * (j + 1 - beg_idx)
                if (
                    j < n - 1
                    and sample_length < batch_size_threshold_ms
                    and potential_batch_length < batch_size
                ):
                    max_len_in_batch = max(max_len_in_batch, sample_length)
                    end_idx += 1
                    results_sorted.extend([[]])
                    continue
                speech_j, speech_lengths_j = slice_padding_audio_samples(
                    speech, speech_lengths, sorted_data[beg_idx:end_idx]
                )
                speech_j_copy = speech_j.copy()
                speech_j = []
                for item in speech_j_copy:
                    if item.numel() != 0:
                        speech_j.append(item)

                results = self.inference(
                    speech_j, input_len=None, model=model, kwargs=kwargs, **cfg
                )
                if self.spk_model is not None:
                    # compose vad segments: [[start_time_sec, end_time_sec, speech], [...]]
                    for _b in range(len(speech_j)):
                        vad_segments = [
                            [
                                sorted_data[beg_idx:end_idx][_b][0][0] / 1000.0,
                                sorted_data[beg_idx:end_idx][_b][0][1] / 1000.0,
                                np.array(speech_j[_b]),
                            ]
                        ]
                        segments = sv_chunk(vad_segments)
                        all_segments.extend(segments)
                        speech_b = [i[2] for i in segments]
                        spk_res = self.inference(
                            speech_b, input_len=None, model=self.spk_model, kwargs=kwargs, **cfg
                        )
                        results[_b]["spk_embedding"] = spk_res[0]["spk_embedding"]
                beg_idx = end_idx
                end_idx += 1
                max_len_in_batch = sample_length

这循环里有continue,如果走了continue, 后面的

            restored_data = [0] * n
            for j in range(n):
                index = sorted_data[j][1]
                restored_data[index] = results_sorted[j]

就一定会index error报错,因为results_sorted的长度没有到n 如果遇到了这样的错误该怎么处理?

xingchensong commented 3 months ago

你这个不是最新代码,你check下最新代码

zzk2021 commented 3 months ago

感谢回复,我查看了,但是我没有用代码里的逻辑,代码里是results_sorted的长度没有到n就pass,我做流式如果用这个会跳掉一些句子,我自己修改了一下