tlienart / tlienart.github.io

Code for my website, powered by Franklin.jl
https://tlienart.github.io
9 stars 5 forks source link

Matrix Inversion Lemmas - Missing Assumptions? #38

Closed chrisyeh96 closed 3 years ago

chrisyeh96 commented 3 years ago

Hi Thibaut,

First off, I really like your CS/ML notes! I'm slowly making my way through them, as most of the topics will likely be relevant in my own research.

In your derivation of the matrix inversion lemmas, I have a couple of questions:

1) You write that Lemma III (essentially the Woodbury formula) holds whenever A, D are invertible matrices and B, C are matrices of appropriate size. However, the derivation is made under the additional assumption that the entire matrix M is invertible. Is it true that if A and D are invertible, then M is guaranteed to be invertible?

2) How exactly do you go from equations (4) to equations (5)? In particular, it seems like (I - inv(A) B inv(D) C) must be invertible. How do you know that this quantity is actually invertible?

An equivalent question is whether `(A - B inv(D) C)` is always invertible whenever A, D are invertible.

Many thanks in advance for clarifying! Keep the notes coming!

Best, Chris

chrisyeh96 commented 3 years ago
  1. Upon further thought, I have answered my own question: If A,D are invertible, then M is not necessarily guaranteed to be invertible. For example, consider M = ones(2,2), i.e., a 2x2 matrix of ones.

  2. My original question 2 still stands, and I would very much appreciate clarification as to why (I - inv(A) B inv(D) C) is invertible.

Additionally, I now also understand that Lemma III (and the Woodbury formula) makes an implicit assumption that either (A - B inv(D) C) or (D - C inv(A) B) is already invertible, and the identity simply gives a formula for computing the other. Again, I think the post could clarify this.

tlienart commented 3 years ago

Hello Chris,

Your feedback is very much appreciated, thanks! it's been a while that I haven't worked on my blog and it's a shame, I'm hoping to get back to it within the coming few months though.

With respect to your second question, I agree that it could be clarified, I think both your question 2 and your further comment can be solved by adding the following reasoning; writing Ai for inv(A),

if you add that then you can do the same reasoning, have:

and then you finish the reasoning saying that W and Z have a given form and that the whole thing holds if that form is invertible i.e. exactly what you say so A-BDiC and D-CAiB must be invertible.

A couple of things I should try to figure out for this to be fully satisfied:

  1. find an example where M, A, D are invertible but not A - BDiC or not D-CAiB, this is probably not hard to do but would help the reader understand that the situation exists and that the lemma doesn't work in that case,
  2. I'm wondering if we assume that A - BDiC is invertible then D-CAiB necessarily is as well or not. If you left multiply the first by Ai and the second by Di then you end up with I - EF and I-FE for E=AiB and F=DiC. If one is invertible, is the other one necessarily invertible? Not sure, I'll have to think about it a bit more :)
tlienart commented 3 years ago

Yeah my point (2) is true. (i.e. (I-EF) is invertible <=> (I-FE) is invertible)): turns out this is well known and there's an easy proof: https://math.stackexchange.com/a/237796

tlienart commented 3 years ago

For fun here's an alternative proof: assume (I-EF) is invertible, if (I-FE) is not invertible then there's a non-zero vector e such that

(I-FE)e = 0
<=> FEe = e

So (1,e) is an eigenpair of FE but then EF(Ee) = Ee and (1, Ee) is valid eigenpair (observe that e cannot be in ker(E) otherwise FEe=e=0 and we assumed that e!=0).

Now let's take the invertible matrix (I-EF):

(I-EF)Ee = Ee - Ee = 0

but then Ee != 0 and (I-EF)Ee = 0 which is a contradiction since (I-EF) is invertible. QED.

tlienart commented 3 years ago

btw, I had a look around, and the matrix in the requirement that we discuss (A-BDiC) is called the Schur complement of D. I guess I learned that a few years ago and forgot 😅

chrisyeh96 commented 3 years ago

Thanks for your detailed responses! I spent some more time thinking about this problem, and it turns out there are 3 theorems that basically resolve all of the issues above:

Theorem 1. If M is invertible, then (A is invertible iff D - CAiB is invertible).

Theorem 2. If M is invertible, then (D is invertible iff A - BDiC is invertible).

Theorem 3. If A and D are invertible and any one of (M, D - CAiB, or A - BDiC) is invertible, then all of the following statements hold:

Note that Theorem 3 contradicts your point (1).

I'm currently working on writing up a concise post about these 3 theorems (with proofs!) on my own website: chrisyeh96.github.io. I'll definitely cite your original post as inspiration!

tlienart commented 3 years ago

Nice :-)

Edit: for my future self when I get to updating the post, reformulation of above points with As = D-CAiB and Ds = A-BDiC:

Proof of (1a), take z = (x1, x2) with As x2 = 0

Dx2 = CAiBx2
Mz = (Ax1 + Bx2 , Cx1 + CAiBx2)

since A is invertible, we can form x1 = -AiBx2, but for that x1, Mz=0 and so z=0 (M inv) so that x2=0 so that As is invertible.

Proof of (1b) is identical. Proof of (3) is trivially implied by the others. (2a) and (2b) are symmetric so we only need to prove (2a). Proceed ad absurdum: Assume there's z=(x1,x2) non zero such that Mz=0 (so M not inv) then

Ax1 + Bx2 = 0
Cx1 + Dx2 = 0

Since A is inv, we can write x1=-AiBx2 and the second equation becomes

(D - CAiB)x2 = 0

or As x2=0 so that x2=0 (As is inv) but then x1=0 as well.

chrisyeh96 commented 3 years ago

Finished writing up my own post : https://chrisyeh96.github.io/2021/05/19/schur-complement.html

tlienart commented 3 years ago

very nice :)