huawei-noah / Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
545 stars 113 forks source link

Typo in some equations in GradTTS paper #25

Closed cantabile-kwok closed 1 year ago

cantabile-kwok commented 1 year ago

Thanks for your great work on GradTTS! However I recently found a tiny error in the arxiv version 2 of GradTTS paper (https://arxiv.org/pdf/2105.06337.pdf). In Eq.31 and Eq.32 in the appendix, the $X_t$ and $\mu$ are put in the wrong order, i.e. it probably should be $\mu - X_t$ rather than $X_t - \mu$. This typo could be originated from the line above Eq.31, " In our case $f(X_t, t) = \frac 12 \Sigma^{-1}(X_t −\mu)\beta$ and ...", where it should be $f(X_t, t)= \frac 12 \Sigma^{-1}(\mu-X_t)\beta$ instead. The other parts of the paper seem not to be affected by this, and the derivations are solid and fluent. Again, great thanks for the work!

ivanvovk commented 1 year ago

@cantabile-kwok Hi! These are different differential equations. Eq.2 is forward diffusion, Eqs. 31 & 32 define the reverse diffusion. The formulas seem to be correct.

cantabile-kwok commented 1 year ago

@ivanvovk Hi! Yes, they are indeed different SDEs, but Eq.(2) implies $f(X_t,t)=\frac{1}{2} \Sigma^{-1}(\mu-X_t)$ right? Then Eq.31, 32 should all use $\mu-X_t$ rather than $X_t-\mu$. It's just a matter of sign actually.

Also, in the main text, you have Eq.8 and 9 that show it should be $\mu-X_t$ too : )

ivanvovk commented 1 year ago

@cantabile-kwok Oh yeah sorry, I see... I looked at another place for the first time. You're right, thank you for pointing this out. Need to fix

cantabile-kwok commented 1 year ago

@ivanvovk No worries, the code is correct. Anyway the appendix is also clear enough 👍