How to match tacotron2?

Ziyan0829 commented 2 years ago

I have another problem that I try to match tacotron2 https://github.com/begeekmyfriend/tacotron2 ,but the generated audio only have noise. The TTS params is already match diffwave, i found that the only difference is mel's range(preprocess is different). Tacotron2's output mel range is [-4, 4], diffwave's input mel range is [0, 1]. So, i try something to solve this problem.

Only change the inference: try to change tacotron's mel range to [0, 1], like the figure. The result become better that i can hear human's voice and some content, but this way lose the speaker's timbre, just like a machine.

Retraining: use tacotron2's mel to training diffwave, after 800k steps, it still only have noise.
Retraining: change tacotron2's mel range like (1), and then training diffwave, after 350k steps, it still only have noise.

Do you have any good suggestions?

sharvil commented 2 years ago

Do you have a mistake in the code snippet you shared? i iterates over row and j iterates over cols but then you index as [i:j]. Maybe you meant [i, j] instead?

Another general suggestion I have is to train on LJSpeech (or a subset of it) to make sure you're getting reasonable results first. Once you've established a good baseline, move on to your own dataset.

v-nhandt21 commented 2 years ago

You could try to norm melspectrogram before training

Some script you can take a look

Find mean and std of mel data

def get_stat(meldir):
     scaler = StandardScaler()
     lines = glob.glob(meldir + "/*.npy")
     for line in lines:
          fi = line
          mel = np.load(fi)[0].transpose()
          scaler.partial_fit(mel)

     cur = os.path.dirname(os.path.realpath(__file__))

     write_hdf5(cur + "/stats.h5", "mean", scaler.mean_.astype(np.float32))
     write_hdf5(cur + "/stats.h5", "scale", scaler.scale_.astype(np.float32))

Fit for data

class Scaler(torch.nn.Module):
    def __init__(self, mean, scale):
        super().__init__()
        self.mean = mean
        if isinstance(mean, np.ndarray):
            self.mean = torch.from_numpy(mean)

        self.scale = scale
        if isinstance(scale, np.ndarray):
            self.scale = torch.from_numpy(scale)

        self.mean = torch.nn.Parameter(self.mean)
        self.scale = torch.nn.Parameter(self.scale)

    def forward(self, x):
        return (x - self.mean) / self.scale

lmnt-com / diffwave

How to match tacotron2? #26