Closed AI-Ahmed closed 1 year ago
I found this an exciting answer that was really intuitive to me (sorry if my question was silly, LA is essential to dive deep into it)!
I thought that $r_k$ are squared matrix, but now, I understand that $r_k^{\top}$ is $1 \times n \ \text{matrix}$ while $r_k$ is $n \times 1 \ \text{matrix}$.
That means – $$r_k^{\top}rk \ = \ \sum^{n}{k=1}{r_k \cdot r_k}$$ Ref: https://math.stackexchange.com/questions/1853808/product-of-a-vector-and-its-transpose-projections
If there is anything else, please let me know!
I have been searching to understand more about the conjugate gradient algorithm. It was really genius idea from Schulman and prof. Pieter Abbeel, et al.
The thank is also for you guys contributing to this and implementing the algorithm.
I was wondering, when I found the mathematical algorithm, that there were things that confused me!
In the algorithm;
rdotr
, but I don't understand why didn't we transpose one of ther
before multiplying it by itself?(torch.sum(p*z) + 1e-8)
.Please, if I am missing something, direct me. Thanks,