p0p4k / pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper
https://neurips.cc/virtual/2023/poster/69899
MIT License
198 stars 28 forks source link

Works only with n_feats=80 #26

Open patriotyk opened 5 months ago

patriotyk commented 5 months ago

I have tried to change n_feats=100 and train, but looks like there is several places hardcoded for 80 because I got crashes that say it expects size 80.

p0p4k commented 5 months ago

Yes, manually replace 80 with 100. There are different modules giving out 80 sized outputs, they can be further edited to change their output sizes, so better to change it to 80 wherever needed. In my another encodec branch, it gives out 128 feats, you can use it for reference.

patriotyk commented 5 months ago

Thank you!!!

lumpidu commented 5 months ago

Could you elaborate, when it would be useful to increase the n_feats setting ?

patriotyk commented 5 months ago

It depends on vocoder. So if you have vocoder trained with 100 features, pflow model also should generate mels with 100 features.

lumpidu commented 5 months ago

Ok, understood. But I would like to know more about the intuition behind changing those parameters: do you want to increase the quality, the frequency resolution, or e.g. by reducing the parameter, inference speed ?

patriotyk commented 5 months ago

In my case I just wanted to try pflow with pretrained vocos vocoder. I just wanted to compare it with pretrained hifigan. I suppose it should sound better but I don't know for sure.