Closed brando90 closed 3 years ago
sample code:
import torch
import torch.nn as nn
from anatome import SimilarityHook
from collections import OrderedDict
from pathlib import Path
# get init
path_2_init = Path('~/data/logs/logs_Nov17_13-57-11_jobid_416472.iam-pbs/ckpt_file.pt').expanduser()
ckpt = torch.load(path_2_init)
mdl = ckpt['f']
#
Din, Dout = 1, 1
mdl = nn.Sequential(OrderedDict([
('fc1_l1', nn.Linear(Din, Dout)),
('out', nn.SELU())
]))
#
hook1 = SimilarityHook(mdl, "fc1_l1")
hook2 = SimilarityHook(mdl, "fc1_l1")
mdl.eval()
#
num_samples_per_task = 5
lb, ub = -1, 1
x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
with torch.no_grad():
mdl(x[0])
hook1.distance(hook2, size=8)
Hi! Anatome supports 1D data (dimension=2 case). The batch-size dimension is needed to use CCA .
Hi! Anatome supports 1D data (dimension=2 case). The batch-size dimension is needed to use CCA .
ok, thanks, perhaps fixing the tutorial/example might help?
Perhaps a fully contained example like mine would be most helpful too.
@moskomule why am I getting really small values? (close to 0?)
I am comparing the same model to itself, so shouldn't the similarity be 1.0ish?
import torch
...: import torch.nn as nn
...: from anatome import SimilarityHook
...:
...: from collections import OrderedDict
...:
...: from pathlib import Path
...:
...: # get init
...: path_2_init = Path('~/data/logs/logs_Nov17_13-57-11_jobid_416472.iam-pbs/ckpt_file.pt').expanduser()
...: ckpt = torch.load(path_2_init)
...: mdl = ckpt['f']
...:
...: #
...: Din, Dout = 1, 1
...: mdl = nn.Sequential(OrderedDict([
...: ('fc1_l1', nn.Linear(Din, Dout)),
...: ('out', nn.SELU())
...: ]))
...:
...: #
...: hook1 = SimilarityHook(mdl, "fc1_l1")
...: hook2 = SimilarityHook(mdl, "fc1_l1")
...: mdl.eval()
...:
...: #
...: num_samples_per_task = 100
...: lb, ub = -1, 1
...: x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
...: with torch.no_grad():
...: mdl(x)
...: hook1.distance(hook2, size=8)
...:
Out[43]: 2.384185791015625e-07
I will add examples. It’s confusing but the value is distance, I think.
I will add examples. It’s confusing but the value is distance, I think.
how do I feed parameters so that the experiments I run have small errors/variance?
is a large size and a large batch size all I need wrt your library?
Maybe CKA helps: SimilarityHook(..., cca_distance="lincka")
.
Maybe CKA helps:
SimilarityHook(..., cca_distance="lincka")
.
curious, why do you say that?
Is it not possible with CCA? e.g. I had in mind the
iters = 10
num_samples_per_task = 100
size = 8
is the larger the better? What are good values you think?
my sample code:
#%%
import torch
import torch.nn as nn
from anatome import SimilarityHook
from collections import OrderedDict
from pathlib import Path
# get init
path_2_init = Path('~/data/logs/logs_Nov17_13-57-11_jobid_416472.iam-pbs/ckpt_file.pt').expanduser()
ckpt = torch.load(path_2_init)
mdl = ckpt['f']
#
Din, Dout = 1, 1
mdl = nn.Sequential(OrderedDict([
('fc1_l1', nn.Linear(Din, Dout)),
('out', nn.SELU())
]))
# mdl.fc1_l1.weight.fill_(2.0)
# mdl.fc1_l1.bias.fill_(2.0)
#
hook1 = SimilarityHook(mdl, "fc1_l1")
hook2 = SimilarityHook(mdl, "fc1_l1")
mdl.eval()
# params for doing "good" CCA
iters = 10
num_samples_per_task = 100
size = 8
# start CCA comparision
lb, ub = -1, 1
with torch.no_grad():
for _ in range(iters):
x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
mdl(x)
hook1.distance(hook2, size=size)
btw, thanks for your help and prompt replies! :D
also, in the line:
x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
should I really be passing random data or should I be passing real data from my data set?
Sorry for the spam, but your example doesn't show how to use your library for what comparing two different models (same architecture and at the same layer).
model = resnet18()
hook1 = SimilarityHook(model, "layer3.0.conv1")
hook2 = SimilarityHook(model, "layer3.0.conv2")
model.eval()
with torch.no_grad():
for _ in range(10):
model(torch.randn(120, 3, 224, 224))
hook1.distance(hook2, size=8)
what I want is something more like this:
sim.CCA(mdl1, mdl2, layer)
how does one do this correctly with your library?
Thanks in advance for your time!
seems this is working, the distance is 0.5ish which is what I'd hoped for nets I expected to be very different:
import torch
import torch.nn as nn
from anatome import SimilarityHook
from collections import OrderedDict
from pathlib import Path
# get init
# path_2_init = Path('~/data/logs/logs_Nov17_13-57-11_jobid_416472.iam-pbs/ckpt_file.pt').expanduser()
# ckpt = torch.load(path_2_init)
# mdl = ckpt['f']
#
Din, Dout = 1, 1
mdl1 = nn.Sequential(OrderedDict([
('fc1_l1', nn.Linear(Din, Dout)),
('out', nn.SELU()),
('fc2_l2', nn.Linear(Din, Dout)),
]))
mdl2 = nn.Sequential(OrderedDict([
('fc1_l1', nn.Linear(Din, Dout)),
('out', nn.SELU()),
('fc2_l2', nn.Linear(Din, Dout)),
]))
with torch.no_grad():
mu = torch.zeros(Din)
# std = 1.25e-2
std = 10
noise = torch.distributions.normal.Normal(loc=mu, scale=std).sample()
# mdl2.fc1_l1.weight.fill_(50.0)
# mdl2.fc1_l1.bias.fill_(50.0)
mdl2.fc1_l1.weight += noise
mdl2.fc1_l1.bias += noise
#
hook1 = SimilarityHook(mdl1, "fc2_l2")
hook2 = SimilarityHook(mdl2, "fc2_l2")
mdl1.eval()
mdl2.eval()
# params for doing "good" CCA
iters = 10
num_samples_per_task = 500
size = 8
# start CCA comparision
lb, ub = -1, 1
with torch.no_grad():
for _ in range(iters):
x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
y1 = mdl1(x)
y2 = mdl2(x)
print((y1-y2).norm(2))
hook1.distance(hook2, size=size)
output:
tensor(40.9213)
tensor(40.9233)
tensor(40.9039)
tensor(40.9288)
tensor(40.9431)
tensor(40.9260)
tensor(40.9240)
tensor(40.9180)
tensor(40.9041)
tensor(40.9124)
Out[36]: 0.45677995681762695
note that if you fill with constants or only the first layer and then compare with the first layer the distance is still low. I am assuming it has something to do with the linear invariant property of CCA (which I still need to read more carefully)
If I understand correctly, you pass matrices of Dout x num_samples_per_task = 1 x 500
to CCA (or CKA). size
is only valid for 4D tensors.
I'm not sure CCA can compare such matrices correctly and stably.
If I understand correctly, you pass matrices of
Dout x num_samples_per_task = 1 x 500
to CCA (or CKA).size
is only valid for 4D tensors.I'm not sure CCA can compare such matrices correctly and stably.
I have control of the num_samples_per_task
. So I guess I am wondering what number/condition would lower the changes my results are questionable. e.g. if num_samples_per_task > # weights
for that layer a good heuristic?
Check the papers.
I am using 1D data but I get this error:
why?