Closed Ziyan0829 closed 2 years ago
Do you have a mistake in the code snippet you shared? i
iterates over row
and j
iterates over cols
but then you index as [i:j]
. Maybe you meant [i, j]
instead?
Another general suggestion I have is to train on LJSpeech (or a subset of it) to make sure you're getting reasonable results first. Once you've established a good baseline, move on to your own dataset.
You could try to norm melspectrogram before training
Some script you can take a look
Find mean and std of mel data
def get_stat(meldir):
scaler = StandardScaler()
lines = glob.glob(meldir + "/*.npy")
for line in lines:
fi = line
mel = np.load(fi)[0].transpose()
scaler.partial_fit(mel)
cur = os.path.dirname(os.path.realpath(__file__))
write_hdf5(cur + "/stats.h5", "mean", scaler.mean_.astype(np.float32))
write_hdf5(cur + "/stats.h5", "scale", scaler.scale_.astype(np.float32))
Fit for data
class Scaler(torch.nn.Module):
def __init__(self, mean, scale):
super().__init__()
self.mean = mean
if isinstance(mean, np.ndarray):
self.mean = torch.from_numpy(mean)
self.scale = scale
if isinstance(scale, np.ndarray):
self.scale = torch.from_numpy(scale)
self.mean = torch.nn.Parameter(self.mean)
self.scale = torch.nn.Parameter(self.scale)
def forward(self, x):
return (x - self.mean) / self.scale
I have another problem that I try to match tacotron2 https://github.com/begeekmyfriend/tacotron2 ,but the generated audio only have noise. The TTS params is already match diffwave, i found that the only difference is mel's range(preprocess is different). Tacotron2's output mel range is [-4, 4], diffwave's input mel range is [0, 1]. So, i try something to solve this problem.
Retraining: use tacotron2's mel to training diffwave, after 800k steps, it still only have noise.
Retraining: change tacotron2's mel range like (1), and then training diffwave, after 350k steps, it still only have noise.
Do you have any good suggestions?