About model implementation differences

p0p4k / pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper

https://neurips.cc/virtual/2023/poster/69899

MIT License

198 stars 28 forks source link

About model implementation differences #35

Open sourcesur opened 4 months ago

sourcesur commented 4 months ago

Hi, thanks for your effort and sharing the code! The architecture blocks in the speech prompted text encoder and CFM decoder differ from the initial ones introduced in the paper. I would like to know what made you do the changes. Was the model not converging with official architecture?

p0p4k commented 4 months ago

Just easier to implement and more modular. Keeping it open ended to make it more accessible to do experiments.

sourcesur commented 4 months ago

I wanted to reproduce the results from the paper, so I used this repo (master branch) to train the model on LibriTTS. I trained it for 800k steps and longer but the overall generation quality is quite far from the official demo. Have you tried reproducing the results?

p0p4k commented 4 months ago

try to play around with the speech prompt encoder. I have not trained this model yet.