Model Parameter adjustments if training own voice

Hello!

It seems that I am lacking a bit the general understanding of the embedding encoding and the synthesizer. Please allow me to post three questions here:

If I only want to use and optimize Angela Merkel's voice, wouldn't it make sense to delete all voice inputs and only leave the voice of Merkel? Or is at least one other (female) voice needed to make it easier for the model to train itsself? What I did was to delete all male voices and leave Merkel's voice and that one from eva_k. Or would it have been more (time) efficient only to use the one target voice? And how about the parameter settings? In my case of two voices I changed the model settings to

speakers_per_batch = 2 utterances_per_speaker = 1000

But I have absolutely no idea if that makes sense. I have a Geforce 2070 with 8 GB RAM. I chose the parameters is that way to use nearly 100% of the RAM. But I also could have set

speakers_per_batch = 20 utterances_per_speaker = 100

to take the same amount of RAM.

If I only want to train and optimize my own voice - am I right that I should only train the model with the (only existing) male voice and with my own audio samples I import by your cool Wikipedia read&record tool? And same parameter question like above.
Today I used the toolbox for the very first time. My Angela voice was trained so far for about 12 hours and the result was impressing! Not perfect of course, 12 hours are not enough, I know, but impressing. But what I do not understand: If I enter a german text into the text field and press the synthesize&vocode button again and again the audio output quality always changes. But why? I thought always the same model (embedding and synthesizer) is used. Same input = same output. So why does it change every time?

Best regards Marc

padmalcom / Real-Time-Voice-Cloning-German

Model Parameter adjustments if training own voice #12