Open yuedajiong opened 1 year ago
it's not formula 5 exactly, they only refer to $\Sigma$. The naive way to parametrize this matrix is by setting 6 numbers (the upper triangle since this matrix is symmetric) and optimize them independently. But the problem is that as the optimization goes, this matrix is not guaranteed to be postive semi-definite (which it must be because it's a covariance matrix). Therefore, to guarantee that it is always positive semi-definite no matter how the gradient descent works, they design formula 6. By definition such matrix (in the form of $A^TA$) is always positive semi-definite. This is a trick we call reparametrization which is often used to avoid unexpected outcome of the optimization.
oh my god, @kwea123, great-master!!! I have heard your technical sharing about instant-npg, from a mainland site named bilibili.com.
thanks !!!
and read this link: https://en.wikipedia.org/wiki/Multivariate_normal_distribution, understood:
logic, step by step: (representation)
The authors said that the gradient-descent can not directly optimize out the VALID covariance matrix likes formula #5, and they designed the formula #6.
Can anybody provide more explanations?