Closed Antymon closed 5 years ago
The derivative of the trace is the trace of the derivative: Both the trace and the derivative are linear operators, so we can exchange them.
One of the things that we don't discuss here (and what we also don't really want to get into) is that in the trace is only defined for matrices. If we operate with tensors the trace turns into a tensor contraction (where we sum out two dimensions).
Clearly I have a problem with understanding this interchangeability of linear operators in this setting. As I understand trace of whatever put in as an argument is a scalar. Is that incorrect? Is trace of derivative not necessarily a scalar? Of course derivatives of traces of matrices are matrices, therefore contradiction when trying to approach this general rule.
That's not necessarily correct. The trace of a (DxDxExE) tensor is 1xExE (if you generalize the trace to a tensor contraction along the first two dimensions). Again, this is stuff I really don't want to get into. And that's why the trace of a derivative may not be a scalar.
Regarding exchangeability: https://en.wikipedia.org/wiki/Trace_%28linear_algebra%29
Similar arguments apply to the transpose (which is also only defined for matrices, not for general tensors).
Fair. Even with matrices, not going into tensors, I don't understand how this changeability can hold. But well, clearly that's not the issue of books correctness.
Effectively, the question is what to do about it. I agree that this issue causes confusion. I would like to keep this section in the book because it's useful. I'm considering adding a comment that points out these issues, referring to other sources for further clarification.
Would this be a solution for you as well?
Ah I was never after whole of section. It's just properties 5.113 and 5.114 is what I don't get. If you have any sources that explain those in more detail then surely, there is a chance they would be helpful. But it seems I would personally need something way more elaborate than likes of The Matrix Cookbook. You mention more of books in that chapter - would any of those be helpful with issues I am facing?
I'm not sure whether they will be more helpful to be honest. I need to find one that makes sense and is semi-comprehensible.
Btw can we make any use of rule 5.113 at all without knowing definition of trace for tensors?
Not really. That's why I want to add a comment.
I could alternatively formulate things just for the scalar case, but the equations do hold for the vector-valued cases, too.
Personally, if there was a star for 5.113 and 5.114 explaining that direct use of them wrt matrix will require tensor-related definitions it would probably be sufficient/less confusing for me. I was initially thinking I can apply one or the other in calculation of derivatives without need for tensor-related definitions and the fact that need for tensor-related definitions was arising led me to conclusion that I must be doing something wrong/misunderstanding what presented in the chapter.
Good point. What do you think about this remark:
Would this be helpful?
Yes, I think it would be, at least for me.
Great! I'll fix this in the next revision and close this issue. Thanks a lot for making this a bit more clear.
Likewise, thank you for the patience.
Describe the mistake It seems odd to me that 5.113 claims that derivative of trace wrt to matrix is trace of derivative since obviously dimensions don't seem to match. Either this is some counter intuitive form or a mistake. It also differs from general form of Matrix cookbook I found, which MML book seems to refer to:
*The Matrix Cookbook [ http://matrixcookbook.com ] Kaare Brandt Petersen Michael Syskind Pedersen Version: November 15, 2012
Side note Similar thing applies to 5.114 . Matrix cookbook seems to use scalar gradient for that one, which makes the dimensions obviously hold.
Location Please provide the