r9y9 / deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
https://r9y9.github.io/deepvoice3_pytorch/
Other
1.97k stars 485 forks source link

How to resume training? Also how to bias/weight the pronunciation to 2nd speaker? #103

Closed ryancwalsh closed 5 years ago

ryancwalsh commented 6 years ago

This project of yours is AMAZING!

Thank you so much for offering this!

I have 370 of my own short (0-10 seconds) audio clips and transcriptions (totaling 15 minutes).

I'm running your program overnight right now to see if I can use LJSpeech 20180505_deepvoice3_checkpoint_step000640000.pth as a starting point that my own recordings would then build on top of.

If I want the resulting TTS voice to sound 100% like the voice of my new recordings and 0% like Linda Johnson, how can I do that?

I see the replace_pronunciation_prob variable in deepvoice3_ljspeech.json. Would setting it to 1.0 lead to the result I want?

Also, if my computer crashes or otherwise aborts training, how can I resume from where it left off?

(I found https://github.com/r9y9/deepvoice3_pytorch/blob/master/train.py#L15, but I'm not sure what that means or how to use it.)

Thank you so much :-)

G-Wang commented 6 years ago

Hello, you want to look at the Speaker Adaptation section on the project README (under advanced usage). As you will be adapting the model trained on LJSpeech to a new voice.

I don't think you have enough training data to get very good voice adaptation, it will also depend on what words you're saying. I was able to adapt to a pretty decent british male voice with about 1.5 hours of speech. (audiobook data)

If you're using your own recorded voice, you should try to record a phonetically balanced speech, such as from the Harvard Sentences (https://www.cs.columbia.edu/~hgs/audio/harvard.html). You can mix the words up and create more sentences.

ryancwalsh commented 6 years ago

Thanks, @G-Wang ! I'm definitely willing to record more audio to make it sound better! That won't be a problem. And I love the idea of Harvard Sentences.

I wonder if you know the answer to my 2 questions?

  1. If I want the resulting TTS voice to sound 100% like the voice of my new recordings and 0% like Linda Johnson, how can I do that? (I see the replace_pronunciation_prob variable in deepvoice3_ljspeech.json. Would setting it to 1.0 lead to the result I want?)

  2. Also, if my computer crashes or otherwise aborts training, how can I resume from where it left off?(I found https://github.com/r9y9/deepvoice3_pytorch/blob/master/train.py#L15, but I'm not sure what that means or how to use it. My Python keeps crashing with an error message like pasted below.)

Thanks.

15it [00:03,  4.73it/s]Save intermediate states at step 29000
Saved checkpoint: checkpoints_mine\checkpoint_step000029000.pth
17it [00:24,  1.43s/it]
Loss: 0.1382137738606509
6it [00:01,  3.92it/s]Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F915F1320>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F915C35C0>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F871536A0>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F7A427860>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F14D15710>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F3C1636D8>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F4BB57B00>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F759EE828>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F262A6BE0>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F759D4128>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F915F10F0>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F14D152E8>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F853C9C88>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F853F2A58>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F853C9860>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F9EDB5128>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F76190978>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F75BD1160>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F761BA198>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F853C94E0>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F7A3B2940>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F9E930D30>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F759BAE80>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F756DD9B0>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F9E8C0FD0>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F86774E10>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object at 0x0000025F871D2C18>>
Traceback (most recent call last):
  File "C:\code\Anaconda\envs\tensorflow\lib\tkinter\__init__.py", line 3364, in __del__
    self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Tcl_AsyncDelete: async handler deleted by the wrong thread
ryancwalsh commented 6 years ago

@G-Wang @r9y9

I think upon re-reading the Readme some more times, I might understand the answer to my question #2: the first time I start using train.py to use Linda Johnson as a basis for a new speaker, I should use something like --restore-parts="20180505_deepvoice3_checkpoint_step000640000.pth", but then if it crashes (which I'd love to learn how to prevent), I can resume by using --checkpoint=checkpoints_my_custom_voice\checkpoint_step000029000.pth. That is what I'll try next.

I did a search of the repo for grep --include=\*.py -Rl "replace_pronunciation_prob" ., which had results:

./hparams.py
./synthesis.py
./train.py

I see in hparams.py that replace_pronunciation_prob is: "Replace words to its pronunciation with fixed probability. 0 means no replacement happens."

But I still don't know what that means. It seems like it is NOT related to my goal about making the new resulting speaker sound most like the 2nd input speaker (and not very much at all like Linda Johnson).

Thoughts on this?

And any ideas on how to prevent crashes?

Thanks so much.

ryancwalsh commented 6 years ago

I just noticed* that https://github.com/r9y9/deepvoice3_pytorch#trouble-shooting might be related to the error I've been seeing, so I will next try MPLBACKEND=PyQt5(Qt5Agg) python train.py ...

(*I'd been reading it as Exception ignored in: <bound method Image.__del__ of <tkinter.PhotoImage object instead of RuntimeError: main thread is not in main loop.)

Still curious about how to favor the newer custom voice instead of Linda Johnson.

I'm so excited about this project. :-) This is the coolest thing I've played with in a while!

G-Wang commented 6 years ago

To adapt the voice to sound exactly like yours, you just need to provide sufficient training data, no need to change any replace_pronounciation_prob, etc (assuming you are recording your speech in English of course).

For speaker adaptation to any voice, you need to prepare a large enough dataset that looks just like LJSpeech data. The LJSpeech data format is a folder with a metadata.csv file, where each line has the format some_audio.wav | I have 10 sentences | I have ten sentences, which is the audio wav name along with it's corresponding sentence. (To get best performance you should ensure silences before and after each sentence in your audio file are trimmed)

You will then run preprocess.py (see the preprocessing section) to generate the processed dataset, which will be saved in a folder you specified.

Once this dataset is prepared, when you run the speaker adaptation code: python train.py --data-root=/directory/to/my/own/LJSpeech/Like/Dataset --checkpoint-dir=checkpoints_for_adaptation \ --preset=presets/deepvoice3_ljspeech.json \ --log-event-path=log/deepvoice3_vctk_adaptation \ --restore-parts="20171213_deepvoice3_checkpoint_step000210000.pth" --speaker-id=0 but update the --data-root path to your own dataset

I would also update the checkpoint save frequency to be faster in your hparams, since you don't need to train so much (or else you run the risk of overfitting to the small dataset and the voice become intelligible).

You will now hear the voice start to change from Linda Johnson to yours as the model trains.

ryancwalsh commented 6 years ago

@G-Wang This is great to hear! Thank you so much. I've split large wav files (some 15 mins, some 9 hours) into very small files based on silences, and I'll discard files that are longer than 10 seconds. And I'll write transcriptions into alignment.json. I'm very optimistic that by tomorrow or someday soon I'll have a good new TTS voice. :-) Thanks.

vignesh-almond commented 6 years ago

Hey @ryancwalsh. What's your progress on this were you able to tune the model to the speaker ?

ryancwalsh commented 6 years ago

@vignesh-almond Unfortunately not. All I got (after many days of running the GPU) was gibberish that sounded like my speaker but wasn't intelligible.

And as a beginning to machine learning, I haven't found clear enough instructions yet on how to do it any better. Intro courses about machine learning tend to discuss image classification (not voice cloning), and I haven't been able to figure much out yet.

aishweta commented 5 years ago

Hello, I'm having error in speaker adaption.

I used MALIBAS data for speaker adaption. I followed readme.

  1. cloned git : git clone https://github.com/r9y9/deepvoice3_pytorch
  2. download check points saved in pre folder
  3. Extracted few 170 samples from MALBAS data and saved in data folder which has wavs and metadata.csv, the formate is same as ljspeech data.
  4. After that I have prepossessed data just like readme. python preprocess.py --preset=presets/deepvoice3_ljspeech.json ljspeech /home/mayur/shweta-new-ai/voice-cloning/new/deepvoice3_pytorch/data/ ./data/ljspeech Now I have preprocessed data. I have done this.
  5. For training speaker adaption I used following cammand python train.py --data-root=/data/ljspeech --checkpoint-dir=pre \ --preset=presets/deepvoice3_ljspeech.json \ --log-event-path=log \ --restore-parts="20180505_deepvoice3_checkpoint_step000640000.pth" --speaker-id=0

But I'm getting error usage: train.py [options]

I have not committed any git, working on master Please help me in this @G-Wang and @r9y9

r9y9 commented 5 years ago

@shwetagargade216 That's likely your command line options are in a wrong format. I guess removing / should do the trick in your case. Also, please open a new issue if it's not related to the issue "how to resume training".

aishweta commented 5 years ago

@G-Wang @r9y9 Could you please let me know which parameters in deepvoice3_ljspeech.json shoul I fix for speaker adaption. I kept audio sample=590 (cmu_arctic data), batch_size = 24, initial_learning_rate= 0.005, epoch =1000, checkpoint_interval= 100 but still not able to get cloned voice. I trained model 2000 steps, keep changed batch size to 16 trained upto 9000 steps.

could you please help me out in this.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.