How should anatome be used if we are comparing nets during training?

brando90 commented 2 years ago

Since the code is attaching hooks and it seems stateful (due to using objects instead of pure functions), how should one use anatome to compute CCAs sims etc correctly?

Is using deep copy of the solution? Or deleting the hook after every use of similarity function?:

def cxa_dist(mdl1: nn.Module, mdl2: nn.Module, X: Tensor, layer_name: str,
             downsample_size: Optional[str] = None, iters: int = 1, cxa_dist_type: str = 'pwcca') -> float:
    import copy
    mdl1 = copy.deepcopy(mdl1)
    mdl2 = copy.deepcopy(mdl2)
    # print(cca_size)
    # meta_batch [T, N*K, CHW], [T, K, D]
    from anatome import SimilarityHook
    # get sim/dis functions
    hook1 = SimilarityHook(mdl1, layer_name, cxa_dist_type)
    hook2 = SimilarityHook(mdl2, layer_name, cxa_dist_type)
    mdl1.eval()
    mdl2.eval()
    for _ in range(iters):  # might make sense to go through multiple is NN is stochastic e.g. BN, dropout layers
        # x = torch_uu.torch_uu.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
        # x = torch_uu.torch_uu.distributions.Uniform(low=-1, high=1).sample((15, 1))
        # x = torch_uu.torch_uu.distributions.Uniform(low=-1, high=1).sample((500, 1))
        mdl1(X)
        mdl2(X)
    # - size: size of the feature map after downsampling
    dist = hook1.distance(hook2, size=downsample_size)
    return float(dist)

brando90 commented 2 years ago

removing hooks:

def cxa_dist(mdl1: nn.Module, mdl2: nn.Module, X: Tensor, layer_name: str,
             downsample_size: Optional[str] = None, iters: int = 1, cxa_dist_type: str = 'pwcca') -> float:
    # import copy
    # mdl1 = copy.deepcopy(mdl1)
    # mdl2 = copy.deepcopy(mdl2)
    # print(cca_size)
    # meta_batch [T, N*K, CHW], [T, K, D]
    from anatome import SimilarityHook
    # get sim/dis functions
    hook1 = SimilarityHook(mdl1, layer_name, cxa_dist_type)
    hook2 = SimilarityHook(mdl2, layer_name, cxa_dist_type)
    mdl1.eval()
    mdl2.eval()
    for _ in range(iters):  # might make sense to go through multiple is NN is stochastic e.g. BN, dropout layers
        # x = torch_uu.torch_uu.distributions.Uniform(low=lb, high=ub).sample((num_samples_per_task, Din))
        # x = torch_uu.torch_uu.distributions.Uniform(low=-1, high=1).sample((15, 1))
        # x = torch_uu.torch_uu.distributions.Uniform(low=-1, high=1).sample((500, 1))
        mdl1(X)
        mdl2(X)
    # - size: size of the feature map after downsampling
    dist = hook1.distance(hook2, size=downsample_size)
    # - remove hook, to make sure code stops being stateful (I hope)
    remove_hook(mdl1, hook1)
    remove_hook(mdl2, hook2)

Is removing the hook a good idea?

moskomule commented 2 years ago

When I used it for my project, I trained a model and saved checkpoints, then compared models loaded from checkpoints.

brando90 commented 2 years ago

When I used it for my project, I trained a model and saved checkpoints, then compared models loaded from checkpoints.

hmmm... what I am doing is I have 1 net and then I compute pwcca, svcca, cka. All in one go in the code (that is why I am doing the deep copying and removing hooks). Do I need to do the deep copying and removing hooks for anatome to work properly? Or will there be unexpected bugs?

moskomule commented 2 years ago

I don't have enough time to consider this problem. Check https://github.com/google/svcca etc.

moskomule / anatome

How should anatome be used if we are comparing nets during training? #19