lmnt-com / diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Apache License 2.0
767 stars 112 forks source link

How to match tacotron2? #26

Closed Ziyan0829 closed 2 years ago

Ziyan0829 commented 2 years ago

I have another problem that I try to match tacotron2 https://github.com/begeekmyfriend/tacotron2 ,but the generated audio only have noise. The TTS params is already match diffwave, i found that the only difference is mel's range(preprocess is different). Tacotron2's output mel range is [-4, 4], diffwave's input mel range is [0, 1]. So, i try something to solve this problem.

图片

Do you have any good suggestions?

sharvil commented 2 years ago

Do you have a mistake in the code snippet you shared? i iterates over row and j iterates over cols but then you index as [i:j]. Maybe you meant [i, j] instead?

Another general suggestion I have is to train on LJSpeech (or a subset of it) to make sure you're getting reasonable results first. Once you've established a good baseline, move on to your own dataset.

v-nhandt21 commented 2 years ago

You could try to norm melspectrogram before training

Some script you can take a look

Find mean and std of mel data

def get_stat(meldir):
     scaler = StandardScaler()
     lines = glob.glob(meldir + "/*.npy")
     for line in lines:
          fi = line
          mel = np.load(fi)[0].transpose()
          scaler.partial_fit(mel)

     cur = os.path.dirname(os.path.realpath(__file__))

     write_hdf5(cur + "/stats.h5", "mean", scaler.mean_.astype(np.float32))
     write_hdf5(cur + "/stats.h5", "scale", scaler.scale_.astype(np.float32))

Fit for data

class Scaler(torch.nn.Module):
    def __init__(self, mean, scale):
        super().__init__()
        self.mean = mean
        if isinstance(mean, np.ndarray):
            self.mean = torch.from_numpy(mean)

        self.scale = scale
        if isinstance(scale, np.ndarray):
            self.scale = torch.from_numpy(scale)

        self.mean = torch.nn.Parameter(self.mean)
        self.scale = torch.nn.Parameter(self.scale)

    def forward(self, x):
        return (x - self.mean) / self.scale