Open hadaev8 opened 4 years ago
Here is generated samples https://drive.google.com/drive/folders/1e4xHQ3XX180QFF2aDBEDwu-lVE9e47_g?usp=sharing The voice does not sounds natural. How do you think, should 8 gpus make it worse?
I added stress embedding because stress every important here. This is my changes:
self.emb = nn.Embedding(n_vocab, hidden_channels, padding_idx=0) nn.init.normal_(self.emb.weight[1:], 0.0, hidden_channels**-0.5) self.stress_emb = nn.Embedding(3, hidden_channels, padding_idx=0) nn.init.normal_(self.stress_emb.weight[1:], 0.0, hidden_channels**-0.5) ... x = self.emb(x) + self.stress_emb(stress) x = x * math.sqrt(self.hidden_channels) # [b, t, h]
Any advice?
could you please share the steps, how you did this for russian language? how many hours of speaking did you use?
Well, I changed tokenization to Cyrillic symbols and stress embedding as above. 40 hours of data.
Here is generated samples https://drive.google.com/drive/folders/1e4xHQ3XX180QFF2aDBEDwu-lVE9e47_g?usp=sharing The voice does not sounds natural. How do you think, should 8 gpus make it worse?
I added stress embedding because stress every important here. This is my changes:
Any advice?