Open sourcesur opened 4 months ago
Just easier to implement and more modular. Keeping it open ended to make it more accessible to do experiments.
I wanted to reproduce the results from the paper, so I used this repo (master branch) to train the model on LibriTTS. I trained it for 800k steps and longer but the overall generation quality is quite far from the official demo. Have you tried reproducing the results?
try to play around with the speech prompt encoder. I have not trained this model yet.
Hi, thanks for your effort and sharing the code! The architecture blocks in the speech prompted text encoder and CFM decoder differ from the initial ones introduced in the paper. I would like to know what made you do the changes. Was the model not converging with official architecture?