Closed StoryHack closed 8 months ago
Hi @StoryHack, To finetune a multi-speaker dataset in a single-speaker model, you need to selec the action "convert single speaker to multi speaker model" instead of "finetune".
That is the one I had selected.
No, you don't selected that, because the log sais:
DEBUG:piper_train:Resuming from single-speaker checkpoint: /content/pretrained.ckpt
And should say someting like:
DEBUG:piper_train:Converting single speaker to multi/speaker model.
I had the correct option selected when I ran step 4. I'm not sure how it doesn't stay.
Maybe it is due to the dataset. Is this ljspeech format? Its structure is file|speaker_id|text?
This is how it looks now. Does the speaker_id need to be numerical?
...
./wavs/Karen_Savage_152.wav|Karen_Savage|Her complexion was quite faultless, much to her mother's satisfaction.
./wavs/Karen_Savage_153.wav|Karen_Savage|I'm so glad to have one daughter who can wear pink, Mrs. Blythe was wont to say jubilantly
./wavs/Karen_Savage_154.wav|Karen_Savage|Diana Blythe, known as Di, was very like her mother, with grey-green eyes that always shone with the peculiar lustre and brilliancy in the dusk, and red hair.
...
Maybe you should exclude "./".
I'm am using the colab training notebook for training/finetuning. I am using a multi-speaker dataset and finetuning starting from the lessac high quality checkpoint. Everything seems to go fine until I hit the step 5 of the notebook. when I run it, I get an error. full output below. None of the previous steps generated any kind of error.