snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MIT License
3.38k stars 353 forks source link

Block when using multiprocess #464

Closed panxin801 closed 2 weeks ago

panxin801 commented 2 weeks ago

When I use multiprocessing spaw to start more than one Process run silero-vad

def post_process(ref_wave_path: str, svc_wave: str):
    """_summary_

    Args:
        ref_wave_path (str): Path of ref audio.
        svc_wave_path (str): Path of svc audio.
    Returns:
        _type_: _description_
    """
    ref_wave, _ = librosa.load(ref_wave_path, sr=16000)
    tmp_wave = torch.from_numpy(ref_wave).squeeze(0)
    tag_wave = get_speech_timestamps(
        tmp_wave, vad_model, threshold=0.2, sampling_rate=16000
    )

    ref_wave[:] = 0
    for tag in tag_wave:
        ref_wave[tag["start"]: tag["end"]] = 1

    ref_wave = np.repeat(ref_wave, 2, -1)

    min_len = min(len(ref_wave), len(svc_wave))
    ref_wave = ref_wave[:min_len]
    svc_wave = svc_wave[:min_len]
    svc_wave[ref_wave == 0] = 0
    return svc_wave, 32000

I find it'l' stuck before this function "get_speech_timestamps". Do you ever meet this problem before ? Thanks for your reply.

snakers4 commented 2 weeks ago

If the VAD object is created in the main process, it may experience locks, because it is a python reference to a C++ object.

Ideally, the correct way is to init VAD in each process from scratch or to use some form of messenging architecture or some thread or multiprocessing executors.