kingjr / b2b

0 stars 0 forks source link

Proof: Ê < 1. when higher snr #2

Open kingjr opened 5 years ago

kingjr commented 5 years ago

Ê does not necessarily reach 1, but is affected by snr:

import numpy as np
from scipy.linalg import pinv

def ols(X, Y):
    return pinv(X.T @ X) @ X.T @ Y

n = 10000
Cx = np.array([[1, -.8], [-.8, 1.]])
X = np.random.multivariate_normal(np.zeros(2), Cx, n)
E = np.array([[1, 0], [0, 0]])
N = np.random.randn(n, 2)
F = np.random.randn(2, 2)
Y = (X @ E + N) @ F

# JRR
G = ols(Y[::2], X[::2])
H = ols(X[1::2], Y[1::2] @ G)
print(np.around(np.diag(H), 3))

[0.511, 0.002]

lopezpaz commented 5 years ago

This is as expected, no?

On Fri, May 17, 2019 at 10:05 AM Jean-Rémi KING notifications@github.com wrote:

Assigned #2 https://github.com/kingjr/jrr/issues/2 to @lopezpaz https://github.com/lopezpaz.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/kingjr/jrr/issues/2?email_source=notifications&email_token=AARCLNZL7G7DLD3PPQRPRKLPVZRN5A5CNFSM4HNTAK4KYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGORP6PJZA#event-2348610788, or mute the thread https://github.com/notifications/unsubscribe-auth/AARCLN6LI4RA3BK5YTYT7ALPVZRN5ANCNFSM4HNTAK4A .

kingjr commented 5 years ago

It seems intuitive to me, but I don't know

On Fri, 17 May 2019 at 11:11, David Lopez-Paz notifications@github.com wrote:

This is as expected, no?

On Fri, May 17, 2019 at 10:05 AM Jean-Rémi KING notifications@github.com wrote:

Assigned #2 https://github.com/kingjr/jrr/issues/2 to @lopezpaz https://github.com/lopezpaz.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/kingjr/jrr/issues/2?email_source=notifications&email_token=AARCLNZL7G7DLD3PPQRPRKLPVZRN5A5CNFSM4HNTAK4KYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGORP6PJZA#event-2348610788 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AARCLN6LI4RA3BK5YTYT7ALPVZRN5ANCNFSM4HNTAK4A

.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/kingjr/jrr/issues/2?email_source=notifications&email_token=ABFHWDCGDNL6PDCH4XFSE63PVZZDZA5CNFSM4HNTAK4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUG2JA#issuecomment-493382948, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFHWDFZDQJZMJ5VKYC6DODPVZZDZANCNFSM4HNTAK4A .

f-charton commented 5 years ago

When F is invertible, X and F are scaled, and covariance is not too high, an empirical formula seems to be k=1/(1+nsn2), nsr2 being the square of the noise to signal ratio. Here, nsr=1, so we expect k=0.5.

The existence of scaling is in the proof. The rationale for the empirical value goes as follows : we know that in the presence of noise, the first regression retrieves k(FE)# instead of (FE)# (M# pseudo inverse of M), and k is chosen to minimize : norm2(I - k(FE)# FE)norm2(X) + norm2(k(FE)# FN) norm2 being the square norm the first term is (1-k)^2 norm2(X) the second k^2 norm2(N)= k^2 nsr2 Norm2(X), so we are minimising (1-k)^2+k^2nsr2 zeroing the derivative over k yields the formula...

I need to check it, and see how it scales to larger dims, non invertible F, etc...