ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
MIT License
1.69k stars 515 forks source link

Should we rely on tensorboard's output for duraion, pitch and energy? #194

Open aidosRepoint opened 1 year ago

aidosRepoint commented 1 year ago

Hi!

in model.module line 128, we have

        if duration_target is not None:
            x, mel_len = self.length_regulator(x, duration_target, max_len)
            duration_rounded = duration_target
        else:
            duration_rounded = torch.clamp(
                (torch.round(torch.exp(log_duration_prediction) - 1) * d_control),
                min=0,
            )
            x, mel_len = self.length_regulator(x, duration_rounded, max_len)
            mel_mask = get_mask_from_lengths(mel_len, self.device)

This means, that in train mode, there will always be duration_target. Does it mean that the output from VarianceAdaptor's forward method will always return the true value for durations? Does it mean that the loss calculation is wrong?