Closed gardaa closed 7 months ago
This is a common pytorch issue when input to the embedding layer has larger numbers than the size of embedding layer. Could you please ensure that the length of the symbols = number in nn.Embedding(
I have tried to change it in line 51 in the matcha_tts.py file (see the code below) to the length of the symbols (which is 216), but I can not get it to work. Which format should it have, and is it the correct file?
if n_spks > 1:
self.spk_emb = torch.nn.Embedding(n_spks, spk_emb_dim)
Edit:
The error is now fixed. Since the values of the speaker IDs were not 1-n, but random numbers, due to the nature of the dataset, it gave an error. So I had to set the num_speakers to max value + 1.
Hi! I am trying to train the Matcha-TTS model on my own dataset in a low-resource language. Therefore, I have to use some ASR data to test it, even though I know it is not the best type of data for TTS training. Because it is a multispeaker dataset, I am changing the n_spks variable in dataset.yaml to 29, which is the total amount of speakers in the training set (in the validation set there are 45 different speakers).
From research, I believe it might have something to do with changing the n_spks from 1 to 29 and that messes with some indexing or boundaries that have been set, but I am not sure. Also, I managed to run it without errors when I had n_spks=1 (even though the results were not great due to noisy dataset).
When I run the script to train the model on a GPU, it gives me hundreds of lines with this error:
Followed by:
I am looking forward for the help with this issue. Thank you so much in advance!