moskomule / anatome

Ἀνατομή is a PyTorch library to analyze representation of neural networks
MIT License
61 stars 6 forks source link

Does code work for 1D data? #1

Closed brando90 closed 3 years ago

brando90 commented 3 years ago

I am using 1D data but I get this error:

RuntimeError: CCAHook currently supports tensors of dimension (2, 4), but got 1 instead.

why?

brando90 commented 3 years ago

sample code:

import torch
import torch.nn as nn
from anatome import SimilarityHook

from collections import OrderedDict

from pathlib import Path

# get init
path_2_init = Path('~/data/logs/logs_Nov17_13-57-11_jobid_416472.iam-pbs/ckpt_file.pt').expanduser()
ckpt = torch.load(path_2_init)
mdl = ckpt['f']

#
Din, Dout = 1, 1
mdl = nn.Sequential(OrderedDict([
    ('fc1_l1', nn.Linear(Din, Dout)),
    ('out', nn.SELU())
]))

#
hook1 = SimilarityHook(mdl, "fc1_l1")
hook2 = SimilarityHook(mdl, "fc1_l1")
mdl.eval()

#
num_samples_per_task = 5
lb, ub = -1, 1
x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
with torch.no_grad():
    mdl(x[0])
hook1.distance(hook2, size=8)
moskomule commented 3 years ago

Hi! Anatome supports 1D data (dimension=2 case). The batch-size dimension is needed to use CCA .

brando90 commented 3 years ago

Hi! Anatome supports 1D data (dimension=2 case). The batch-size dimension is needed to use CCA .

ok, thanks, perhaps fixing the tutorial/example might help?

Perhaps a fully contained example like mine would be most helpful too.

brando90 commented 3 years ago

@moskomule why am I getting really small values? (close to 0?)

I am comparing the same model to itself, so shouldn't the similarity be 1.0ish?

import torch
   ...: import torch.nn as nn
   ...: from anatome import SimilarityHook
   ...: 
   ...: from collections import OrderedDict
   ...: 
   ...: from pathlib import Path
   ...: 
   ...: # get init
   ...: path_2_init = Path('~/data/logs/logs_Nov17_13-57-11_jobid_416472.iam-pbs/ckpt_file.pt').expanduser()
   ...: ckpt = torch.load(path_2_init)
   ...: mdl = ckpt['f']
   ...: 
   ...: #
   ...: Din, Dout = 1, 1
   ...: mdl = nn.Sequential(OrderedDict([
   ...:     ('fc1_l1', nn.Linear(Din, Dout)),
   ...:     ('out', nn.SELU())
   ...: ]))
   ...: 
   ...: #
   ...: hook1 = SimilarityHook(mdl, "fc1_l1")
   ...: hook2 = SimilarityHook(mdl, "fc1_l1")
   ...: mdl.eval()
   ...: 
   ...: #
   ...: num_samples_per_task = 100
   ...: lb, ub = -1, 1
   ...: x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
   ...: with torch.no_grad():
   ...:     mdl(x)
   ...: hook1.distance(hook2, size=8)
   ...: 
Out[43]: 2.384185791015625e-07
moskomule commented 3 years ago

I will add examples. It’s confusing but the value is distance, I think.

brando90 commented 3 years ago

I will add examples. It’s confusing but the value is distance, I think.

how do I feed parameters so that the experiments I run have small errors/variance?

is a large size and a large batch size all I need wrt your library?

moskomule commented 3 years ago

Maybe CKA helps: SimilarityHook(..., cca_distance="lincka").

brando90 commented 3 years ago

Maybe CKA helps: SimilarityHook(..., cca_distance="lincka").

curious, why do you say that?

Is it not possible with CCA? e.g. I had in mind the

iters = 10
num_samples_per_task = 100
size = 8

is the larger the better? What are good values you think?

my sample code:

#%%

import torch
import torch.nn as nn
from anatome import SimilarityHook

from collections import OrderedDict

from pathlib import Path

# get init
path_2_init = Path('~/data/logs/logs_Nov17_13-57-11_jobid_416472.iam-pbs/ckpt_file.pt').expanduser()
ckpt = torch.load(path_2_init)
mdl = ckpt['f']

#
Din, Dout = 1, 1
mdl = nn.Sequential(OrderedDict([
    ('fc1_l1', nn.Linear(Din, Dout)),
    ('out', nn.SELU())
]))
# mdl.fc1_l1.weight.fill_(2.0)
# mdl.fc1_l1.bias.fill_(2.0)

#
hook1 = SimilarityHook(mdl, "fc1_l1")
hook2 = SimilarityHook(mdl, "fc1_l1")
mdl.eval()

# params for doing "good" CCA
iters = 10
num_samples_per_task = 100
size = 8
# start CCA comparision
lb, ub = -1, 1
with torch.no_grad():
    for _ in range(iters):
        x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
        mdl(x)
hook1.distance(hook2, size=size)

btw, thanks for your help and prompt replies! :D

brando90 commented 3 years ago

also, in the line:

        x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))

should I really be passing random data or should I be passing real data from my data set?

brando90 commented 3 years ago

Sorry for the spam, but your example doesn't show how to use your library for what comparing two different models (same architecture and at the same layer).

        model = resnet18()
        hook1 = SimilarityHook(model, "layer3.0.conv1")
        hook2 = SimilarityHook(model, "layer3.0.conv2")
        model.eval()
        with torch.no_grad():
            for _ in range(10):
                model(torch.randn(120, 3, 224, 224))
        hook1.distance(hook2, size=8)

what I want is something more like this:

sim.CCA(mdl1, mdl2, layer)

how does one do this correctly with your library?

Thanks in advance for your time!


seems this is working, the distance is 0.5ish which is what I'd hoped for nets I expected to be very different:

import torch
import torch.nn as nn
from anatome import SimilarityHook

from collections import OrderedDict

from pathlib import Path

# get init
# path_2_init = Path('~/data/logs/logs_Nov17_13-57-11_jobid_416472.iam-pbs/ckpt_file.pt').expanduser()
# ckpt = torch.load(path_2_init)
# mdl = ckpt['f']

#
Din, Dout = 1, 1
mdl1 = nn.Sequential(OrderedDict([
    ('fc1_l1', nn.Linear(Din, Dout)),
    ('out', nn.SELU()),
    ('fc2_l2', nn.Linear(Din, Dout)),
]))
mdl2 = nn.Sequential(OrderedDict([
    ('fc1_l1', nn.Linear(Din, Dout)),
    ('out', nn.SELU()),
    ('fc2_l2', nn.Linear(Din, Dout)),
]))
with torch.no_grad():
    mu = torch.zeros(Din)
    # std =  1.25e-2
    std = 10
    noise = torch.distributions.normal.Normal(loc=mu, scale=std).sample()
    # mdl2.fc1_l1.weight.fill_(50.0)
    # mdl2.fc1_l1.bias.fill_(50.0)
    mdl2.fc1_l1.weight += noise
    mdl2.fc1_l1.bias += noise

#
hook1 = SimilarityHook(mdl1, "fc2_l2")
hook2 = SimilarityHook(mdl2, "fc2_l2")
mdl1.eval()
mdl2.eval()

# params for doing "good" CCA
iters = 10
num_samples_per_task = 500
size = 8
# start CCA comparision
lb, ub = -1, 1
with torch.no_grad():
    for _ in range(iters):
        x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
        y1 = mdl1(x)
        y2 = mdl2(x)
        print((y1-y2).norm(2))
hook1.distance(hook2, size=size)

output:

tensor(40.9213)
tensor(40.9233)
tensor(40.9039)
tensor(40.9288)
tensor(40.9431)
tensor(40.9260)
tensor(40.9240)
tensor(40.9180)
tensor(40.9041)
tensor(40.9124)
Out[36]: 0.45677995681762695

note that if you fill with constants or only the first layer and then compare with the first layer the distance is still low. I am assuming it has something to do with the linear invariant property of CCA (which I still need to read more carefully)

moskomule commented 3 years ago

If I understand correctly, you pass matrices of Dout x num_samples_per_task = 1 x 500 to CCA (or CKA). size is only valid for 4D tensors.

I'm not sure CCA can compare such matrices correctly and stably.

brando90 commented 3 years ago

If I understand correctly, you pass matrices of Dout x num_samples_per_task = 1 x 500 to CCA (or CKA). size is only valid for 4D tensors.

I'm not sure CCA can compare such matrices correctly and stably.

I have control of the num_samples_per_task. So I guess I am wondering what number/condition would lower the changes my results are questionable. e.g. if num_samples_per_task > # weights for that layer a good heuristic?

moskomule commented 3 years ago

Check the papers.