Open marlon-br opened 4 years ago
Add these two hparams:
"n_speakers": 10,
"gin_channels": 16
I'm not sure what the ideal value for gin_channels
is to get a rich embedding, and I asked in another thread.
Your training data and validation CSVs should be in this format:
filename|numeric_speaker_id|transcript
You'll need to swap out the loader:
-from data_utils import TextMelLoader, TextMelCollate
+from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate
You'll also need to change the forward function to accept the g
speaker id parameter and unpack the speaker ids from the loader enumerations.
i meant not to retrain the whole model once again. only to add one more voice
Add these two hparams:
"n_speakers": 10, "gin_channels": 16
I'm not sure what the ideal value for
gin_channels
is to get a rich embedding, and I asked in another thread.Your training data and validation CSVs should be in this format:
filename|numeric_speaker_id|transcript
You'll need to swap out the loader:
-from data_utils import TextMelLoader, TextMelCollate +from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate
You'll also need to change the forward function to accept the
g
speaker id parameter and unpack the speaker ids from the loader enumerations.
Sorry for jumping in, could you please elaborate the last part about changing the forward function? Thanks in advance!
Add these two hparams:
"n_speakers": 10, "gin_channels": 16
I'm not sure what the ideal value for
gin_channels
is to get a rich embedding, and I asked in another thread.Your training data and validation CSVs should be in this format:
filename|numeric_speaker_id|transcript
You'll need to swap out the loader:
-from data_utils import TextMelLoader, TextMelCollate +from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate
You'll also need to change the forward function to accept the
g
speaker id parameter and unpack the speaker ids from the loader enumerations.
Hi @echelon , This information is really useful. I believe I've done necessary changes as suggested by you. In my case I've kept n_speakers = 24 and gin_channels = 256 and rest of the parameters in base.json is same. Number of samples in training records are 9102. I'm getting below runtime error.
RuntimeError: Given groups=1, weight of size 256 448 3, expected input[1, 192, 89] to have 448 channels, but got 192 channels instead
Can you please advice what is going wrong here.
Hi @marlon-br, @dechubby , Were you able to run in multi speaker mode? Have you done any other changes apart from whatever mentioned by echelon? I'm getting some issue which I'm not able to debug.
Any help will be really appreciated.
Regards, Prasanta
Hi Jaehyeon,
Could you please provide instructions how to use pretrained model and add new speaker voice?
I have created google colab file basing on your work: https://github.com/marlon-br/glow-tts-colab Now I want to add a possibility to have more speaker voices.