Training strategy - Githubissues

Hey, i am back again with another question :P

Can I interpret the two-stage training scheme as:

The training of CTC-Attention phoneme recognizer, speaker encoder, and Vocoder. Above three can be trained separately on their own.
The training of seq2seqMoL, it will need the output from CTC-Attention phoneme recognizer and speaker encoder. Each training instance is like (A's sentence_1, B's sentence_x, B's sentence_1), MSE is computed between the model's output B's sentence_1 and the ground truth B's sentence_1.

Please correct me if i am wrong.

Thanks!

liusongxiang / ppg-vc