Closed johnpaulbin closed 1 year ago
Example usage code:
original_target_mel = output["mel_outputs_postnet"]
if not cpu_run:
original_target_mel = original_target_mel.cuda()
speaker_ids = torch.LongTensor([0]).cuda()
inputs = (
text_padded,
input_lengths,
original_target_mel,
torch.LongTensor([0]).cuda(),
torch.LongTensor([0]).cuda(),
speaker_ids,
embedding
)
attn = taco.get_alignment(inputs)
noattention_output = taco.inference_noattention(
text_padded, input_lengths, speaker_ids, embedding, attn.transpose(0, 1)
)
y_g_hat = hifigan.vocoder.forward(torch.tensor(noattention_output["mel_outputs_postnet"], dtype=torch.float, device=device))
audio = y_g_hat.reshape(1, -1)
audio = audio * 32768.0
Audio(audio.cpu().detach().numpy(), rate=22050)
looks good to me
Add
inference_noattention()
to models/tacotron2 for easier inferencing.Fixes
get_alignment()
by allowing GSTs