hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
https://hubertsiuzdak.github.io/snac/
MIT License
439 stars 26 forks source link

Support for streaming inference #20

Open ojus1 opened 2 months ago

ojus1 commented 2 months ago

Is it possible to use pretrained weights for predicting codes in a chunk-wise fashion (streaming input audio)?

MrWaterZhou commented 1 week ago

Sliding window should be enough ` def sliding_window(data, window_size=21, step=7): return [data[i:i + window_size] for i in range(0, len(data) - window_size + 1, step)]

    id_list = sliding_window(output_ids)
    pcm_list = []
    for i, l in enumerate(id_list):
        audio_hat = decode(l)
        if i == 0:
            # first chunk
            pcm_list.append(audio_hat[:, :, :2048 * 2])
        elif i < len(id_list)-1:
            # middle 
            pcm_list.append(audio_hat[:, :, 2048:2048 * 2])
        else:
            # last chunk
            pcm_list.append(audio_hat[:, :, 2048:])
    pcm_list = torch.cat(pcm_list, dim=-1)
    torchaudio.save('stream_test.wav', pcm_list[0].cpu(), 24000)

`