Hi, I tried your code for training with both multi-speaker and single-speaker conditions, and it worked well for both training and inference. However, when I made some minor changes to model.py, modifying the forward and inference functions (e.g., replacing the speaker ID with speaker embeddings from a pre-trained speaker recognition model).
if n_speakers > 1:
self.emb_g = nn.Embedding(n_speakers, gin_channels)
self.linlin = nn.Linear(768, gin_channels)
def forward(self, x, x_lengths, y, y_lengths, sid=None):
if self.n_speakers > 0:
# g = self.emb_g(sid).unsqueeze(-1) # [b, h, 1]
g = sid
g = self.linlin(g).unsqueeze(-1) # [b, h, 1]
else:
g = None
....
def infer(
self,
x,
x_lengths,
sid=None,
noise_scale=1,
length_scale=1,
noise_scale_w=1.0,
max_len=None,
):
if self.n_speakers > 0:
# g = self.emb_g(sid).unsqueeze(-1) # [b, h, 1]
g = sid
g = self.linlin(g).unsqueeze(-1) # [b, h, 1]
else:
g = None
x, m_p, logs_p, x_mask = self.enc_p(x, x_lengths, g=g)
....
Training works well and the loss convergence:
[2.560758352279663, 2.2537946701049805, 3.8962457180023193, 20.862136840820312, 0.8815252184867859, 2.2975285053253174, 24100, 0.00019459892692329838]
But when I infer, the quality of audio is so bad (audio can represent speakers style but can't represent any word).
Do you have any idea for this?
Hi, I tried your code for training with both multi-speaker and single-speaker conditions, and it worked well for both training and inference. However, when I made some minor changes to model.py, modifying the forward and inference functions (e.g., replacing the speaker ID with speaker embeddings from a pre-trained speaker recognition model).
Training works well and the loss convergence: [2.560758352279663, 2.2537946701049805, 3.8962457180023193, 20.862136840820312, 0.8815252184867859, 2.2975285053253174, 24100, 0.00019459892692329838]
But when I infer, the quality of audio is so bad (audio can represent speakers style but can't represent any word). Do you have any idea for this?
sample: Text: Scarcely had he uttered the name when Pierre's closing eyes shot open Audio: https://drive.google.com/file/d/1OtWPVw82alLTV3n4i7e9kb1FaKPJ0OBW/view?usp=sharing