ma-gilles / recovar

GNU General Public License v3.0
14 stars 2 forks source link

Question about subspace_angles function #1

Closed DSilva27 closed 1 month ago

DSilva27 commented 1 month ago

Dear Marc, I have learned a lot reading your code, thank you for making it available.

I have a couple of questions about the functions

def find_angle_between_subspaces(v1,v2, max_rank):
    ss = np.conj(v1[:,:max_rank]).T @ v2[:,:max_rank]
    s,v,d = np.linalg.svd(ss)
    if np.any(v > 1.2):
        print('v too big!')
    v = np.where(v < 1, v, 1)
    return np.sqrt( 1 - v[-1]**2)

def subspace_angles(u ,v, max_rank = None, check_orthogonalize = False):
    max_rank = u.shape[-1] if max_rank is None else max_rank
    corr = np.zeros(max_rank)
    if check_orthogonalize:
        u,_ = np.linalg.qr(u)
        v,_ = np.linalg.qr(v)

    for k in range(1,max_rank+1):
        if k > u.shape[-1]:
            corr[k-1] = 1
        else:
            corr[k-1] = find_angle_between_subspaces(u[:,:k], v[:,:k], max_rank = k )
    return corr
  1. Why do you compute the sines of the angles by setting a max rank and then iterating? What is the difference between this and simply running find_angle_between_subspaces(v1, v2, max_rank=rank(v1)) and returning all the sine principal angles instead of just the last one?

  2. I am also confused by relating the sine principal angles to the correlation (assuming that's what the variable corr represents). Wouldn't a sine of 1 represent a correlation of 0, and a sine of 0 represent a correlation of 1? Why use the sines instead of the cosines?

Thank you,

David

PD: I apologize if this is not the best way to ask this question. I will remove the issue if this is the case.

ma-gilles commented 1 month ago

Hi David,

I'm glad you found it useful, and I think this is a great forum for this question!

I think you know this already but I’ll explain in case someone else reads this: why not just compare the first estimated principal components to the first true principal component, then the second estimate to the second true, etc?

The issue is that principal components/eigenvectors are sometimes not uniquely defined. E.g., if two eigenvalues are the same, then there is no way to uniquely define an eigenvector, and there are really two “first” eigenvectors (u_1,u_2), or more precisely, there is a two-dimensional invariant subspace. If we force one-to-one comparison and we arbitrarily define u_1 to the correct one, then there is no reason that the estimated can’t return u_2. Since u_1 is orthogonal to u_2, we would conclude we got 100% error whereas we really got a 0% error (in the sense that we found a vector that exactly minimizes the objective function we care about). This is particularly bad when estimating from data because our estimated eigenvectors will likely be a combination of the top true eigenvectors since we can’t recover anything exactly.

So it makes more sense to look at metrics between subspaces say S1 and S2. I chose the ``largest principal angle” metric (alg. 6.4.3 in Golub & Van Loan), which computes: what is the vector in S2 that is the most orthogonal to all S1. This has the nice property that it is not a function of the ordering of the eigenvectors. So I am computing how similar are the first two-dimensional subspaces, the first three-dimensional subspaces, first four-dimensional subspaces etc.

I believe it is not the same as computing all principal angles between the first say, 20 dimensions. But that doesn’t mean that it’s not a valid metric. The reason compute that is that I just didn’t think about it, but it could be a nice metric too.

That being said, I think both of these metrics are not ideal, and I used a new one in the new manuscript (on arXiv tomorrow, hopefully, but I can send it to you now if you want). The reason I don’t love them is that they both disregard eigenvalues, which is not great because we don’t care about estimating an eigenvector that has a tiny eigenvalue (in cryo-EM, at least). Instead, I use the captured variance, which computes how much of the true variance is captured in the subspace spanned by the first K estimated eigenvectors (and its normalized version). This is what is computed here. I think it captures better what we care about.

  1. This is just terrible naming, corr is the sine principal angle, not the correlation. I have no idea why I named it that! I think cos would be fine too (would just need to return v[-1] instead of np.sqrt( 1 - v[-1]**2). I just like having a “0” error have a score of 0.

DSilva27 commented 1 month ago

Thank you for your answer! That book looks like something I should read, thank you for sharing it.

I share the feeling about the angles not being ideal, as they don't take into account the singular values. I have some questions about the captured covariance metric, but I will wait until I read the new manuscript to ask them.

Just one small question for now. Assuming test_v is the basis you are testing against, U the basis you want to test, and sqrt(s) being the singular values associated to U. Wouldn't this metric have the issue of not being symmetric? For example, let's say you are comparing two different singular value decompositions. Would it make sense to have s be an average of the singular values for both decompositions? (including some scale to make sure that one does not dominate the other based on normalization).

ma-gilles commented 1 month ago

Yes, you got the definitions right. And yes it's not symmetric. I've only used it to compare against synthetic datasets so that hasn't come up.

I would probably use d(V, U, s_U) + d(V, U, s_U) where d(V,U,S) = | V^U S^{1/2} |_F^2, which is also equal to the sum of the singular values of (V^ Sigma V), where Sigma = US U^*.

Btw, I realized that what I wrote in the preprint is not correct (that would be the sum of squares of singular so values), so thanks for asking about it!

DSilva27 commented 1 month ago

I think I am a bit confused.

I was reading the new version of the manuscript, and couldn't help but notice that in Figure A.6 the captured variance is defined as (in the notation of this thread) $| V^ \Sigma V |_F^2 / | \Sigma |_F^2$. Now, let's say that we did not compute an approximated basis, but we computed all the $N^3$ vectors. Wouldn't this always return 1? Since $| V^ \Sigma V |_F^2 = | \Sigma|_F^2$.

In the case of the paper, let's say we find any other $U_{k - N^3}$ vectors that are orthogonal to $U_k$. Then this would also happen. Is this an intended feature of this metric?

For the case in your previous comment, we would have $| V^* U S^{1/2} |_F^2 = |S^{1/2}|_F^2$, so the subspace we are testing (the one given by V) would be irrelevant to the metric.

The reason I started thinking about this is that I noticed that the d(V, U, S_U) defined above is not the same as the definition in the paper in Figure A.6. Is it that this metric only makes sense if you have a truncated basis?

ma-gilles commented 1 month ago

I think everything you said is correct and it only makes sense with a truncated basis. But I think that is the case for every metric on subspaces. E.g., any two n-dimensional subspaces of R^n are just all of R_n and thus would have a "largest principal angle" of 0. In this case, it would be interpreted as 100% of the variance is contained in R^n, which makes a good amount of sense.

Yes, there is an error in the preprint that's what I meant in the previous comment... This is what is in the newest version (online at some point, which I think is correct...):

The accuracy of the principal component is described using the percent of the captured variance, computed as

$$ | (U_k^* \Sigma U_k)^{1/2} |_F^2 \big / | \Sigma^{1/2}|_F^2 $$

, where $U_k$ is the matrix containing the first $k$ estimated principal components, $\Sigma$ is the true covariance matrix and $\Sigma^{1/2}$ is the matrix square-root, so that

$$ | A^{1/2} |_F^2 = \sum \sigma_i( A^{1/2})^2 = \sum \sigma_i(A) $$

where $\sigma_i(A)$ are the singular values of $A$.

If you force $U_k$ to be of dimension $k$ then this is maximized by the first $k$ singular vectors (the 'exact' curve in the paper). The motivation is that the only thing I care about for my work is: how much of the variance is captured in the estimated subspace, maybe it's not so relevant to what you're doing.

DSilva27 commented 1 month ago

Thank you! This is actually exactly what I need. Sorry about the confusion, I just wanted to make sure I understood the meaning of the metric. The explanation about how $R^n$ captures all the variance makes a lot of sense. Also, I really enjoyed reading through the new preprint

I think I have no more questions regarding this. I will not close the issue in case you want to keep it open for visibility.