insunhwang89 / CyFi-TTS

MIT License
6 stars 0 forks source link

About Cyclic Normalizing Flow #1

Open nuts-kun opened 1 year ago

nuts-kun commented 1 year ago

Hi, thank you for your great work! :) I could not find your email address in the paper and Google Scholar bio, so please let me ask you a question on this issue.

I have a question about Cyclic Normalizing Flow. In Equation 3 of the paper, cycle consistency loss is defined as KL divergence between $p(z^{''}|x)$ and $p(z^{'}|x)$. Here, since $p(z^{′′}|x) = f(f^{−1}(p(z^{'}|x)))$ and volume-preserving flow is used in the VITS, I think the following formula is true unless using Dropout in the Normalizing Flow part (except for the effect of padding on the edges). $L_{cc}=KL[p(z^{′′}|x)||p(z^{'}|x)]=KL[p(z^{'}|x)||p(z^{'}|x)]=0$ In the original implementation of VITS, Dropout is 0 and there is no mention of using Dropout in the Normalizing Flow part in the paper, but do the experiments in the paper use Dropout?

I know you are currently busy during ICASSP, but I would be grateful if you could reply when you are free :).

insunhwang89 commented 1 year ago

As you suggested, the the representation of forward and backward directions should be equal. However, the mismatch problem occurs because the input vector in each direction is different. This is because the linguistic representation is produced by the prior encoder and the posterior representation is created in the posterior encoder. Therefore, we wanted to match forward and backward with only the linguistic representation.

nuts-kun commented 1 year ago

Thanks for your reply :)

I agree with your point. As the other works such as NaturalSpeech also show that enhancing prior and reducing posterior are really important to improve TTS quality. But, only my question is about how to train model using cycle consistency loss. In my understanding, cycle consistency loss should be 0, so the gradients should also be 0. If gradients is 0, I think the model is not updated by this loss term. Therefore, I wonder why this loss affect to improve model. Could you tell me about this point?

Also, if you can share your code, it might be great help for me to understand. Thank you :)

nuts-kun commented 1 year ago

Hi, how about the above points?