moskomule / anatome

Ἀνατομή is a PyTorch library to analyze representation of neural networks
MIT License
61 stars 6 forks source link

do you preprocess the matrices for us? #12

Closed brando90 closed 2 years ago

brando90 commented 2 years ago

I just noticed these two paragraphs from the papers I read and was wondering if you centered the matrices or if the user has to center them for anatome to work before hand.

Screen Shot 2021-09-15 at 4 31 04 PM Screen Shot 2021-09-15 at 4 30 48 PM

moskomule commented 2 years ago

Yes https://github.com/moskomule/anatome/blob/c4c069183aca8aad6f73a4b7ab86f7f7e4ca3d04/anatome/similarity.py#L94

brando90 commented 2 years ago

Yes

https://github.com/moskomule/anatome/blob/c4c069183aca8aad6f73a4b7ab86f7f7e4ca3d04/anatome/similarity.py#L94

Note that function doesn't address what the first paper mentions (which is divide by the frobenius norm too).

So you don't divide by that? If not I am curious why.

Thanks for your replies! I appreciate it.

brando90 commented 2 years ago

I really do think dividing by frobenius norm is a good idea @moskomule . E.g. do you really want your similarities to be scale depedent? Since std of vectors/matrices is hard to compute (without considering covariances or whitening) the simplest way to remove scale is to divide by l2 norm (or frobenius for matrices). New code would be:

def _zero_mean(input: Tensor,
               dim: int
               ) -> Tensor:
    from torch.linalg import norm
    return input - input.mean(dim=dim, keepdim=True) / norm(input, 'fro')

what do you think?

brando90 commented 2 years ago

if you do change it let me know so I can pull the new version of anatome and make sure I don't divide by fro norm too many times. Ideally I think fro norm is indepotent (so a second division does nothing since the matrix is already a unit norm vector just flatten the matrix it doesn't change rep of it wrt to norm op)

moskomule commented 2 years ago

Both CCA and CKA are invariant to scale. Recently implemented Proc distance is internally rescaled.

brando90 commented 2 years ago

Both CCA and CKA are invariant to scale. Recently implemented Proc distance is internally rescaled.

ah so only OP needs the rescaling? weird why did the paper mentioned all of them required to have matrix A to be rescaled...?

moskomule commented 2 years ago

I guess it's for simplicity.