Closed chrisyeh96 closed 3 years ago
Upon further thought, I have answered my own question: If A,D are invertible, then M is not necessarily guaranteed to be invertible. For example, consider M = ones(2,2)
, i.e., a 2x2 matrix of ones.
My original question 2 still stands, and I would very much appreciate clarification as to why (I - inv(A) B inv(D) C)
is invertible.
Additionally, I now also understand that Lemma III (and the Woodbury formula) makes an implicit assumption that either (A - B inv(D) C)
or (D - C inv(A) B)
is already invertible, and the identity simply gives a formula for computing the other. Again, I think the post could clarify this.
Hello Chris,
Your feedback is very much appreciated, thanks! it's been a while that I haven't worked on my blog and it's a shame, I'm hoping to get back to it within the coming few months though.
With respect to your second question, I agree that it could be clarified, I think both your question 2 and your further comment can be solved by adding the following reasoning; writing Ai
for inv(A)
,
Mi = (W X; Y Z)
with the assumption that both W
and Z
are invertible (+rest of the proof)if you add that then you can do the same reasoning, have:
Fi = (I -AiBDiC)
so that FiW = Ai
W
is invertible so Fi = Ai Wi
Fi
is the product of two invertible matrices so invertible as welland then you finish the reasoning saying that W
and Z
have a given form and that the whole thing holds if that form is invertible i.e. exactly what you say so A-BDiC
and D-CAiB
must be invertible.
A couple of things I should try to figure out for this to be fully satisfied:
M, A, D
are invertible but not A - BDiC
or not D-CAiB
, this is probably not hard to do but would help the reader understand that the situation exists and that the lemma doesn't work in that case,A - BDiC
is invertible then D-CAiB
necessarily is as well or not. If you left multiply the first by Ai
and the second by Di
then you end up with I - EF
and I-FE
for E=AiB
and F=DiC
. If one is invertible, is the other one necessarily invertible? Not sure, I'll have to think about it a bit more :) Yeah my point (2) is true. (i.e. (I-EF)
is invertible <=> (I-FE)
is invertible)): turns out this is well known and there's an easy proof: https://math.stackexchange.com/a/237796
For fun here's an alternative proof: assume (I-EF)
is invertible, if (I-FE)
is not invertible then there's a non-zero vector e
such that
(I-FE)e = 0
<=> FEe = e
So (1,e)
is an eigenpair of FE
but then EF(Ee) = Ee
and (1, Ee)
is valid eigenpair (observe that e
cannot be in ker(E)
otherwise FEe=e=0
and we assumed that e!=0
).
Now let's take the invertible matrix (I-EF)
:
(I-EF)Ee = Ee - Ee = 0
but then Ee != 0
and (I-EF)Ee = 0
which is a contradiction since (I-EF)
is invertible. QED.
btw, I had a look around, and the matrix in the requirement that we discuss (A-BDiC
) is called the Schur complement of D
. I guess I learned that a few years ago and forgot 😅
Thanks for your detailed responses! I spent some more time thinking about this problem, and it turns out there are 3 theorems that basically resolve all of the issues above:
Theorem 1. If M is invertible, then (A is invertible iff D - CAiB
is invertible).
Theorem 2. If M is invertible, then (D is invertible iff A - BDiC
is invertible).
Theorem 3. If A and D are invertible and any one of (M
, D - CAiB
, or A - BDiC
) is invertible, then all of the following statements hold:
M
, D - CAiB
, and A - BDiC
are invertibleZ = inv(D - CAiB) = Di + DiC inv(A - BDiC) BDi
and so naturally Z
is invertible with Zi = D - CAiB
W = inv(A - BDiC) = Ai + AiB inv(D - CAiB) CAi
and so naturally W
is invertible with Wi = A - BDiC
Note that Theorem 3 contradicts your point (1).
I'm currently working on writing up a concise post about these 3 theorems (with proofs!) on my own website: chrisyeh96.github.io. I'll definitely cite your original post as inspiration!
Nice :-)
Edit: for my future self when I get to updating the post, reformulation of above points with As = D-CAiB
and Ds = A-BDiC
:
M
inv, A
inv => As
invM
inv, D
inv => Ds
invA
and As
inv => M
invD
and Ds
inv => M
invA
and D
inv + one of (As, Ds, M)
inv => they all are (theorem 3 pt1 above)Proof of (1a), take z = (x1, x2)
with As x2 = 0
Dx2 = CAiBx2
Mz = (Ax1 + Bx2 , Cx1 + CAiBx2)
since A
is invertible, we can form x1 = -AiBx2
, but for that x1
, Mz=0
and so z=0
(M inv) so that x2=0
so that As
is invertible.
Proof of (1b) is identical. Proof of (3) is trivially implied by the others. (2a) and (2b) are symmetric so we only need to prove (2a). Proceed ad absurdum: Assume there's z=(x1,x2)
non zero such that Mz=0
(so M
not inv) then
Ax1 + Bx2 = 0
Cx1 + Dx2 = 0
Since A
is inv, we can write x1=-AiBx2
and the second equation becomes
(D - CAiB)x2 = 0
or As x2=0
so that x2=0
(As
is inv) but then x1=0
as well.
Finished writing up my own post : https://chrisyeh96.github.io/2021/05/19/schur-complement.html
very nice :)
Hi Thibaut,
First off, I really like your CS/ML notes! I'm slowly making my way through them, as most of the topics will likely be relevant in my own research.
In your derivation of the matrix inversion lemmas, I have a couple of questions:
1) You write that Lemma III (essentially the Woodbury formula) holds whenever A, D are invertible matrices and B, C are matrices of appropriate size. However, the derivation is made under the additional assumption that the entire matrix M is invertible. Is it true that if A and D are invertible, then M is guaranteed to be invertible?
2) How exactly do you go from equations (4) to equations (5)? In particular, it seems like
(I - inv(A) B inv(D) C)
must be invertible. How do you know that this quantity is actually invertible?Many thanks in advance for clarifying! Keep the notes coming!
Best, Chris