How to improve quality over time with more transcriptions?

I have 70 minutes of transcribed audio clips for a new speaker. The clips are each a max of 10 seconds.

I started with Linda Johnson as a basis by running:

python train.py --data-root=./data/fresh --checkpoint-dir=checkpoints_fresh --preset=presets/deepvoice3_ljspeech.json --log-event-path=log/fresh ----restore-parts="data\LJSpeech_1_1\20180505_deepvoice3_checkpoint_step000640000.pth" --speaker-id=0

Every day, I transcribe more audio clips of this new speaker. I assume that transcribing more and more clips will lead to better results. (And I remember that Linda Johnson recorded close to 24 hours of audio samples of sentences with considerable variety.)

@r9y9 I wonder if you know or could guess the answer to these questions:

Is it ok that I lowered the "batch size" within deepvoice3_ljspeech.json to 10? (I did that because CUDA kept running out of memory and crashing.)
How many minutes of transcribed audio clips would let me hear a result that sounds like the new speaker? (@G-Wang said 1.5 hours, and @Kyubyong Kyubyong Park says just 1 minute!)
How many "steps" does the checkpoint need to have before the new speaker will be trained enough to sound good? (I don't know what a step means.)
After transcribing another X minutes of audio samples, I run python preprocess.py json_meta "C:\code\voice_cloning\audio\alignment.json" "./data/fresh" --preset=presets/deepvoice3_ljspeech.json. So then:

Is it okay for me to resume training by running python train.py --data-root=./data/fresh --checkpoint-dir=checkpoints_fresh --preset=presets/deepvoice3_ljspeech.json --log-event-path=log/fresh --checkpoint="checkpoints_fresh\checkpoint_step000017000.pth" --speaker-id=0, or must I start over from scratch every time I've run preprocess.py since I've added more transcriptions?

Should I be monitoring anything and adjusting my approach somehow based on results? (I don't understand what the graphs like step000025000_text4_single_alignment.png represent.)

I really appreciate your help.

And as a thank-you, I want to share a tool that I just built and have been using for the past couple days to make transcriptions super fast. It uses an API to pull in surprisingly accurate speech-to-text (Google's wasn't good enough in my experience):

https://send.firefox.com/download/00119bffbe/#ehNtuTyv9KIumI_VTdj7Dg

I hope it helps.

r9y9 / deepvoice3_pytorch

How to improve quality over time with more transcriptions? #105