Closed Umaruchain closed 1 year ago
The code is here. Actually, the cca score is 100, but the cca_loss is not. But if I do not use the MLP, the two print result is the same (99).
x =torch.randn((2000,100))
x_p = MLP_Project(x,insize=100,outsize=100) case_1 = [x_p,x_p]
case_2 = [torch.FloatTensor(x_p),torch.FloatTensor(x_p)] print(-cca_loss(case_2))
print(sum(CCA(100).fit(case_1).score(case_1)))
This code is insufficient to recreate the problem.
If I use cca_zoo.models.CCA and cca_zoo.deepmodels.objectives.CCA I observe approximately the same loss
Your code does not make clear what cca_loss or CCA actually are. I think the bug is in your code somewhere
Sorry, here is the code. I wonder why the cca_loss([x_p,x_p]) and cca_loss([x,x]) are different since the cca_loss[xp,xp] is really strange
from cca_zoo.models import CCA import numpy as np import pdb import torch import tensorly as tl import torch from tensorly.cp_tensor import cp_to_tensor from tensorly.decomposition import parafac import dcor
from cca_zoo.models import CCA,KCCA,KGCCA,MCCA import torch.nn as nn class MLP(nn.Module): def init(self,insize,outsize): super(MLP, self).init() self.project = nn.Sequential(nn.Linear(insize,100), nn.ReLU(), nn.Linear(100,100), nn.ReLU(), nn.Linear(100,100), nn.ReLU(), nn.Linear(100,outsize) )
def forward(self, x):
x = self.project(x)
return x
def Initialize_Seed(seed=2):
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.benchmark=False
#os.environ["PYTHONHASHSEED"]= str(seed)
def _matpow(mat, pow, epsilon):
[D, V] = torch.linalg.eigh(mat)
mat_pow = V @ torch.diag((D + epsilon).pow(pow_)) @ V.T
mat_pow[mat_pow != mat_pow] = epsilon # For stability
return mat_pow
def _demean(views): return tuple([view - view.mean(dim=0) for view in views])
def cca_loss(views,dim=100): n = views[0].shape[0]
views = _demean(views)
# Concatenate all views and from this get the cross-covariance matrix
all_views = torch.cat(views, dim=1)
C = all_views.T @ all_views / (n - 1)
#pdb.set_trace()
# Get the block covariance matrix placing Xi^TX_i on the diagonal
D = torch.block_diag(
*[
(1 - 0.0) * m.T @ m / (n - 1)
+ 0.0 * torch.eye(m.shape[1], device=m.device)
for i, m in enumerate(views)
]
)
# pdb.set_trace()
C = C - torch.block_diag(*[view.T @ view / (n - 1) for view in views]) + D
R = _mat_pow(D, -0.5, 1e-3)
# In MCCA our eigenvalue problem Cv = lambda Dv
C_whitened = R @ C @ R.T
eigvals = torch.linalg.eigvalsh(C_whitened)
# Sort eigenvalues so lviewest first
idx = torch.argsort(eigvals, descending=True)
eigvals = eigvals[idx[: dim]]
# leaky relu encourages the gradient to be driven by positively correlated dimensions while also encouraging
# dimensions associated with spurious negative correlations to become more positive
eigvals = torch.nn.LeakyReLU()(eigvals[torch.gt(eigvals, 0)] - 1)
corr = eigvals.sum()
return -corr
Initialize_Seed() x =torch.randn((2000,100)) net = MLP(insize=100,outsize=100)
x_p = net(x) case_1 = [x_p,x_p] case_2 = [x_p.detach().numpy(),x_p.detach().numpy()] print(-cca_loss(case_1)) print(sum(CCA(100).fit(case_2).score(case_2)))
case_1 = [x,x] case_2 = [x.numpy(),x.numpy()] print(-cca_loss(case_1)) print(sum(CCA(100).fit(case_2).score(case_2)))
test.zip here is the python file, please take a look
In my environment, the output is as follow
tensor(7.2112, grad_fn=
Ignore my previous comments. I misidentified the problem. It is actually due to the eps in the function:
R = _mat_pow(D, -0.5, 1e-3)
If you use eps=0 then all should be fine.
wow, thank you so much!!! Can you give some advice about the eps? It is really common that we need dcca to tackle some embeddings obtained from deep network. Maybe the default value of the eps should be 0 ?
And I set eps = 1e-10, the result is also fine.
Glad everything at least works how I'd expect.
In order to default to zero I'd want to be convinced it didn't cause training to be unstable even though I agree the behaviour you observe here is undesirable/unexpected
Happy to have a video call and here what you're working on if I can be any help
Wow, it is my honor to have a video call with you. How about setting a suitable time by email? My email address is junlin.he@polyu.edu.hk.
Dear author, Thanks again for your sincere help, but I have found some strange problems. I generate X by torch. randn, which is a 2000*100 embedding. I use a randomly initialized network(MLP) to project X and the output dim is 100. So I get Z=MLP(X). The dcca loss of ([X, X]) is -99 while the dcca loss of [Z, Z] is -6. I am very confused about the result, could you please give me some advice?