Open ojus1 opened 2 months ago
Sliding window should be enough ` def sliding_window(data, window_size=21, step=7): return [data[i:i + window_size] for i in range(0, len(data) - window_size + 1, step)]
id_list = sliding_window(output_ids)
pcm_list = []
for i, l in enumerate(id_list):
audio_hat = decode(l)
if i == 0:
# first chunk
pcm_list.append(audio_hat[:, :, :2048 * 2])
elif i < len(id_list)-1:
# middle
pcm_list.append(audio_hat[:, :, 2048:2048 * 2])
else:
# last chunk
pcm_list.append(audio_hat[:, :, 2048:])
pcm_list = torch.cat(pcm_list, dim=-1)
torchaudio.save('stream_test.wav', pcm_list[0].cpu(), 24000)
`
Is it possible to use pretrained weights for predicting codes in a chunk-wise fashion (streaming input audio)?