requirements & phonemizer ?

zidsi commented 8 months ago

First of all - thank you for your effort and quick implementation.

In Instructions to run section of README you suggest

pip install -r requirements.txt

however it looks like requirements.txt is missing in repo. (not critical)

Additional Q about dataset preparation. Paper says: A G2P model [5] preprocesses the text into the International Phonetic Alphabet (IPA) format.

Do you train using IPA phonemes or straight chars?

p0p4k commented 8 months ago

My implementation is most a POC right now, just to test if this concept works well or not. I did do a somewhat similar training using vits-1, but the model collapsed (in retrospect, i can see my mistakes now, like not freezing the posterior encoder, etc.) In this pflow implementation, I am adding 3 types of flow-estimators, a U-Net flow and two types of modified wavenet flows. Also, the speech_prompt_encoder is slightly different from the paper (mine tries to extract more features before attending).
I think the text folder contains phonemizer with espeak backend in this repo, so we use IPA. (updated requirements.txt)

zidsi commented 8 months ago

Tnx for info. I see phonemizer in text cleaners :( I was able to "skip" G2P step for VITS2 and it works good.

p0p4k / pflowtts_pytorch