Closed zidsi closed 8 months ago
pflow
implementation, I am adding 3 types of flow-estimators, a U-Net flow
and two types of modified wavenet flows
. Also, the speech_prompt_encoder
is slightly different from the paper (mine tries to extract more features before attending).Tnx for info. I see phonemizer in text cleaners :( I was able to "skip" G2P step for VITS2 and it works good.
First of all - thank you for your effort and quick implementation.
In Instructions to run section of README you suggest
however it looks like requirements.txt is missing in repo. (not critical)
Additional Q about dataset preparation. Paper says: A G2P model [5] preprocesses the text into the International Phonetic Alphabet (IPA) format.
Do you train using IPA phonemes or straight chars?