moboehle / B-cos

B-cos Networks: Alignment is All we Need for Interpretability
Other
104 stars 10 forks source link

Result folder #7

Closed h-moghaddam closed 1 year ago

h-moghaddam commented 1 year ago

Hello

I have a few questions regarding the contents of the result folder:

1- How have you calculated the logit.np? the shape of this array is [50000,2]. 50000 is the number of images in the ImageNet validation set, but I cannot understand where 2 comes from. Is there any way that I can (re)produce it? 2- Could you please provide more information to help me better understand the contents of the feature_layer_x.json? How are the neurons indexed? What do the values of each neuron represent? 3- How have you generated feature_layer_x_neurons_sorted.txt?

Thanks

moboehle commented 1 year ago

Hi,

thanks for the questions and sorry if this was unclear.

  1. logits.np holds the top-2 logits for every image, the second highest activation is not relevant for the notebooks. If you evaluate the models on the validation set and compute the top-2 logit for each image, you should be able to easily reproduce the results.
  2. The neurons are indexed by their index in the weight matrix of a given layer. The values of the neurons should be the mean over the maximal activation per image, calculated across images.
  3. The sorted neurons are simply the neurons sorted by their average max activation as above.

I hope this helps!

h-moghaddam commented 1 year ago

Thanks a lot for your reply. The logits.np is clear to me now. Could you please elaborate on how can I reproduce $\mathbf W_{1\rightarrow L}(\mathbf x)$ up to a specific layer? I assume the results are later dumped in the feature_layer_x.json, isn't it?

moboehle commented 1 year ago

Hi,

to obtain $\mathbf W_{1\rightarrow L}(\mathbf x)$, you can simply compute the activations at an intermediate layer and compute the backward pass with the model in explanation mode. I.e., it is the same process as for explaining the logits.

h-moghaddam commented 1 year ago

Thanks for your reply.