moskomule / anatome

Ἀνατομή is a PyTorch library to analyze representation of neural networks
MIT License
61 stars 6 forks source link

what is the recommended input to the similarity functions provided by anatome? #13

Closed brando90 closed 2 years ago

brando90 commented 2 years ago

I was wondering if you usually pass in the activations (after relu) or the pre-activations (before relu after affine transform)?

I was realizing that these similarity functions are often made to be invariant to affine transformers (https://www.youtube.com/watch?v=TBjdvjdS2KM) so I wanted to make sure I chose toe right one.

I could choose after relu but then it feels weird because some entries will be killed acording to the relu...so is relu really the right one?

moskomule commented 2 years ago

I think most activation functions are ReLU-like, so if pre-activations are similar to each other, then post-activations are also similar. I don't think there is an ultimate similarity function, so chose it base on what you need.

brando90 commented 2 years ago

I think most activation functions are ReLU-like, so if pre-activations are similar to each other, then post-activations are also similar. I don't think there is an ultimate similarity function, so chose it base on what you need.

in my opinion the pre-activation makes more sense because:

  1. relu functions introduce zeros which must increase the similarity.
  2. the pre-activations do the affine transform which is more similar to how regression computes functions - so it's more similar to a functional comparison of networks (which ultimately I argue it's the ultimate similarity - if two functions are the same then they are the same and it's the end of the discussion but in general that is hard to compute). So I think scores or pre-activations are a good value to use.

Thanks again!