moskomule / anatome

Ἀνατομή is a PyTorch library to analyze representation of neural networks
MIT License
61 stars 6 forks source link

potential improvement for CNN size=None (or bug?) #23

Closed brando90 closed 2 years ago

brando90 commented 2 years ago

I noticed for size=None you assume an activation is a neuron.

It is possible to have this instead:

            # - convolution layer [M, C, H, W]
            if size is None:
                # - no downsampling: [M, C, H, W] -> [M, C, H*W]
                # an activation is an (effective) neuron is an activation (in the spatial dimension)
                # (effective) data size is

                # flatten(2) -> flatten from 2 to -1 (end)
                self_tensor = self_tensor.flatten(start_dim=2, end_dim=-1).contiguous()
                other_tensor = other_tensor.flatten(start_dim=2, end_dim=-1).contiguous()

                # improvement [M, C, H, W] -> [M, C*H*W]
                self_tensor = self_tensor.flatten(start_dim=3, end_dim=-1).contiguous()
                other_tensor = other_tensor.flatten(start_dim=3, end_dim=-1).contiguous()
                return self.cca_function(self_tensor, other_tensor).item()

Original paper Screen Shot 2021-10-28 at 12 42 40 PM

I am aware you later go and compare them by looping through each data point later, which is not exactly equivalent as the above - though that is a small nuance. That approach assumes C is the effective size of the data and that each activation in the spatial dimension is a filter. But usually a neuron (vector) is considered to have size with respect to the data set so usually it's [M, CHW] or [MHW, C]. So I'm unsure why having the filter size as the effective size of the data set for CCA is justified.

I will go with [MHW, C] since I think the definition of a neuron per filter makes more sense and each patch seen by a filter as a data point makes more sense. I think due to the nature of CCA, this is fine to apply even across layers. If you want to know why I'm happy to copy paste that section of the background section of my paper here.


Screen Shot 2021-10-28 at 12 48 47 PM

Thanks for your great library and feedback!

moskomule commented 2 years ago

Thank you for your comment. I did not intend to compare different layers, but maybe I should add an option. The reason I used looping was to avoid an OOM problem I faced at that time.

brando90 commented 2 years ago

makes sense. I was worried about that in my current implementation.

I added support to compare via filter and via your implementation and TODO via activation as options in my fork:

feel free to copy paste it, ask question etc. Hope it helps.