Umaruchain commented 1 year ago

Dear author, Thanks again for your sincere help, but I have found some strange problems. I generate X by torch. randn, which is a 2000*100 embedding. I use a randomly initialized network(MLP) to project X and the output dim is 100. So I get Z=MLP(X). The dcca loss of ([X, X]) is -99 while the dcca loss of [Z, Z] is -6. I am very confused about the result, could you please give me some advice?

Umaruchain commented 1 year ago

The code is here. Actually, the cca score is 100, but the cca_loss is not. But if I do not use the MLP, the two print result is the same (99).

x =torch.randn((2000,100))

y = torch.randn((20000,100))

case_1 = [MLP_Project(x,insize=100,outsize=100),MLP_Project(y,insize=100,outsize=100)]

x_p = MLP_Project(x,insize=100,outsize=100) case_1 = [x_p,x_p]

pdb.set_trace()

case_2 = [torch.FloatTensor(x_p),torch.FloatTensor(x_p)] print(-cca_loss(case_2))

print(cca_loss_chat(case))

print(loss(case))

print(sum(CCA(100).fit(case_1).score(case_1)))

jameschapman19 commented 1 year ago

This code is insufficient to recreate the problem.

If I use cca_zoo.models.CCA and cca_zoo.deepmodels.objectives.CCA I observe approximately the same loss

jameschapman19 commented 1 year ago

Your code does not make clear what cca_loss or CCA actually are. I think the bug is in your code somewhere

Umaruchain commented 1 year ago

Sorry, here is the code. I wonder why the cca_loss([x_p,x_p]) and cca_loss([x,x]) are different since the cca_loss[xp,xp] is really strange

from cca_zoo.models import CCA import numpy as np import pdb import torch import tensorly as tl import torch from tensorly.cp_tensor import cp_to_tensor from tensorly.decomposition import parafac import dcor

from cca_zoo.models import CCA,KCCA,KGCCA,MCCA import torch.nn as nn class MLP(nn.Module): def init(self,insize,outsize): super(MLP, self).init() self.project = nn.Sequential(nn.Linear(insize,100), nn.ReLU(), nn.Linear(100,100), nn.ReLU(), nn.Linear(100,100), nn.ReLU(), nn.Linear(100,outsize) )

def forward(self, x):

    x = self.project(x)

    return x

def Initialize_Seed(seed=2):

random.seed(seed)

np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.benchmark=False
#os.environ["PYTHONHASHSEED"]= str(seed)

def _matpow(mat, pow, epsilon):

Computing matrix to the power of pow (pow can be negative as well)

[D, V] = torch.linalg.eigh(mat)
mat_pow = V @ torch.diag((D + epsilon).pow(pow_)) @ V.T
mat_pow[mat_pow != mat_pow] = epsilon  # For stability
return mat_pow

def _demean(views): return tuple([view - view.mean(dim=0) for view in views])

def cca_loss(views,dim=100): n = views[0].shape[0]

Subtract the mean from each output

views = _demean(views)

    # Concatenate all views and from this get the cross-covariance matrix
all_views = torch.cat(views, dim=1)
C = all_views.T @ all_views / (n - 1)
    #pdb.set_trace()

# Get the block covariance matrix placing Xi^TX_i on the diagonal
D = torch.block_diag(
        *[
            (1 - 0.0) * m.T @ m / (n - 1)
            + 0.0 * torch.eye(m.shape[1], device=m.device)
            for i, m in enumerate(views)
        ]
)
# pdb.set_trace()
C = C - torch.block_diag(*[view.T @ view / (n - 1) for view in views]) + D

R = _mat_pow(D, -0.5, 1e-3)

# In MCCA our eigenvalue problem Cv = lambda Dv
C_whitened = R @ C @ R.T

eigvals = torch.linalg.eigvalsh(C_whitened)

# Sort eigenvalues so lviewest first
idx = torch.argsort(eigvals, descending=True)

eigvals = eigvals[idx[: dim]]

# leaky relu encourages the gradient to be driven by positively correlated dimensions while also encouraging
# dimensions associated with spurious negative correlations to become more positive
eigvals = torch.nn.LeakyReLU()(eigvals[torch.gt(eigvals, 0)] - 1)

corr = eigvals.sum()

return -corr

Initialize_Seed() x =torch.randn((2000,100)) net = MLP(insize=100,outsize=100)

x_p = net(x) case_1 = [x_p,x_p] case_2 = [x_p.detach().numpy(),x_p.detach().numpy()] print(-cca_loss(case_1)) print(sum(CCA(100).fit(case_2).score(case_2)))

case_1 = [x,x] case_2 = [x.numpy(),x.numpy()] print(-cca_loss(case_1)) print(sum(CCA(100).fit(case_2).score(case_2)))

Umaruchain commented 1 year ago

test.zip here is the python file, please take a look

Umaruchain commented 1 year ago

In my environment, the output is as follow tensor(7.2112, grad_fn=) 99.99999860966659 tensor(99.7890) 99.99999999999187

jameschapman19 commented 1 year ago

Ignore my previous comments. I misidentified the problem. It is actually due to the eps in the function:

R = _mat_pow(D, -0.5, 1e-3)

If you use eps=0 then all should be fine.

Umaruchain commented 1 year ago

wow, thank you so much!!! Can you give some advice about the eps? It is really common that we need dcca to tackle some embeddings obtained from deep network. Maybe the default value of the eps should be 0 ?

Umaruchain commented 1 year ago

And I set eps = 1e-10, the result is also fine.

jameschapman19 commented 1 year ago

Glad everything at least works how I'd expect.

In order to default to zero I'd want to be convinced it didn't cause training to be unstable even though I agree the behaviour you observe here is undesirable/unexpected

jameschapman19 commented 1 year ago

Happy to have a video call and here what you're working on if I can be any help

Umaruchain commented 1 year ago

Wow, it is my honor to have a video call with you. How about setting a suitable time by email? My email address is junlin.he@polyu.edu.hk.

jameschapman19 / cca_zoo

The DCCA loss of two equal view is -6 (dim=100). #165

y = torch.randn((20000,100))

case_1 = [MLP_Project(x,insize=100,outsize=100),MLP_Project(y,insize=100,outsize=100)]

pdb.set_trace()

print(cca_loss_chat(case))

print(loss(case))

random.seed(seed)

Computing matrix to the power of pow (pow can be negative as well)

Subtract the mean from each output