moskomule / anatome

Ἀνατομή is a PyTorch library to analyze representation of neural networks
MIT License
61 stars 6 forks source link

cca code for GPU code not working #4

Closed brando90 closed 3 years ago

brando90 commented 3 years ago

small example:

import torch
import torch.nn as nn
from anatome import SimilarityHook

from collections import OrderedDict

#
Din, Dout = 1, 1
mdl1 = nn.Sequential(OrderedDict([
    ('fc1_l1', nn.Linear(Din, Dout)),
    ('out', nn.SELU()),
    ('fc2_l2', nn.Linear(Din, Dout)),
]))
mdl2 = nn.Sequential(OrderedDict([
    ('fc1_l1', nn.Linear(Din, Dout)),
    ('out', nn.SELU()),
    ('fc2_l2', nn.Linear(Din, Dout)),
]))

print(f'is cuda available: {torch.cuda.is_available()}')

with torch.no_grad():
    mu = torch.zeros(Din)
    # std =  1.25e-2
    std = 10
    noise = torch.distributions.normal.Normal(loc=mu, scale=std).sample()
    # mdl2.fc1_l1.weight.fill_(50.0)
    # mdl2.fc1_l1.bias.fill_(50.0)
    mdl2.fc1_l1.weight += noise
    mdl2.fc1_l1.bias += noise

if torch.cuda.is_available():
    mdl1 = mdl1.cuda()
    mdl2 = mdl2.cuda()

hook1 = SimilarityHook(mdl1, "fc1_l1")
hook2 = SimilarityHook(mdl2, "fc1_l1")
mdl1.eval()
mdl2.eval()

# params for doing "good" CCA
iters = 10
num_samples_per_task = 500
size = 8
# start CCA comparision
lb, ub = -1, 1

for _ in range(iters):
    x = torch.torch.distributions.Uniform(low=-1, high=1).sample((num_samples_per_task, 1))
    if torch.cuda.is_available():
        x = x.cuda()
    y1 = mdl1(x)
    y2 = mdl2(x)
    print(f'y1 - y2 = {(y1-y2).norm(2)}')
print('about to do cca')
dist = hook1.distance(hook2, size=size)
print('cca done')
print(f'cca dist = {dist}')
print('--> Done!\a')

but it always has a segmentation error:

(automl-meta-learning) miranda9~/automl-meta-learning $ python test_cca_gpu.py 
is cuda available: True
y1 - y2 = 4.561897277832031
y1 - y2 = 3.7458858489990234
y1 - y2 = 3.8464999198913574
y1 - y2 = 4.947702407836914
y1 - y2 = 5.404015064239502
y1 - y2 = 4.85843563079834
y1 - y2 = 4.000360488891602
y1 - y2 = 4.194643020629883
y1 - y2 = 4.894904613494873
y1 - y2 = 4.7721710205078125
about to do cca
Segmentation fault

why? how is this fixed?

moskomule commented 3 years ago

I don't know but it's a segmentation fault. Please ask the PyTorch dev team.

brando90 commented 3 years ago

I don't know but it's a segmentation fault. Please ask the PyTorch dev team.

@moskomule I don't think this has something to do with pytorch. It only happens when using the anatome library. Are you parallelizing how you compute CCA? If you are it might be the issue since that means 2 of your threads are trying to access memory they don't have permission to (or something like that which is leading to the segmentation error).

Here are the links to the segmentation errors with pytorch and stack overflow (so):

brando90 commented 3 years ago

@moskomule btw I am happy to help making it work for your library :-)

I currently believe from the research I did (on the pytorch forum etc) that it's unlikely a bug in pytorch (since it seems they fixed it since 0.4.0 and I think we are all using at leat 1 and on) but more likely a bug with anatome. Let me know if my question about multiple threading helps us track the bug. :)