p0p4k / pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper
https://neurips.cc/virtual/2023/poster/69899
MIT License
214 stars 30 forks source link

AlignerNet #22

Closed Tera2Space closed 8 months ago

Tera2Space commented 8 months ago

So i noticed commented aligner in pflow code, you commented it because it didn't work or it didn't improve quality much?

p0p4k commented 8 months ago

I left it in for the curious people to try it.

Tera2Space commented 8 months ago

Yep, i tried but loss stuck at around 6, i think it's because we give to it full output of the encoder(mu) instead of giving only output of text_base_encoder (like in naturalspeech), can it be the reason ?

p0p4k commented 8 months ago

I am not sure if I understand your question.. if you want to edit and paste the code here, i might understand better.

Tera2Space commented 8 months ago

We calculate aligns:

aln_hard, aln_soft, aln_log, aln_mask = self.aligner(mu_x.transpose(1,2), x_mask, y, y_mask)

mu_x in output from encoder with information about speechprompt in it

mu_x, logw, x_mask = self.encoder(x, x_lengths, prompt_slice)

so i think using x_emb from textencoder may work better as input for aligner

x_emb = self.text_base_encoder(x_emb, x_emb_mask)
p0p4k commented 8 months ago

If MAS can align it with mu_x, the alignernet should be able to do it too. It will take longer to converge though.

Tera2Space commented 8 months ago

And last question, for AlignerNet to work I only need to uncomment:

        # self.aligner = Aligner(
        #     dim_in=encoder.encoder_params.n_feats,
        #     dim_hidden=encoder.encoder_params.n_feats,
        #     attn_channels=encoder.encoder_params.n_feats,
        #     )

        # self.aligner_loss = ForwardSumLoss()
        # self.bin_loss = BinLoss()
        # self.aligner_bin_loss_weight = 0.0

and

        # aln_hard, aln_soft, aln_log, aln_mask = self.aligner(
        #     mu_x.transpose(1,2), x_mask, y, y_mask
        #     )
        # attn = aln_mask.transpose(1,2).unsqueeze(1)
        # align_loss = self.aligner_loss(aln_log, x_lengths, y_lengths)
        # if self.aligner_bin_loss_weight > 0.:
        #     align_bin_loss = self.bin_loss(aln_mask, aln_log, x_lengths) * self.aligner_bin_loss_weight
        #     align_loss = align_loss + align_bin_loss
        # dur_loss = F.l1_loss(logw, attn.sum(2))
        # dur_loss = dur_loss + align_loss

and comment MAS usage, right?

p0p4k commented 8 months ago

Yes correct.

Tera2Space commented 8 months ago

Great, thanks a lot for the answers!