Open robintibor opened 3 years ago
I couldn't follow your calculation of the activations to 100%. As I understood from the paper the activations of one layer are the unit outputs. So the dimension of the activations/unit outputs is the same as the output shape of that layer!?
To calculate the correlation between the mean squared envelopes and the activations, their dimensions have to be the same. And you said that they are exactly the same since the envelopes are calculated with the receptive field size of the corresponding layer.
Your deep model contains the following layers with output shapes:
Layer (type) Output Shape Param #
Expression-1 [-1, 1, 1000, 22] 0
Conv2d-2 [-1, 25, 991, 22] 275
Conv2d-3 [-1, 25, 991, 1] 13,750
BatchNorm2d-4 [-1, 25, 991, 1] 50
Expression-5 [-1, 25, 991, 1] 0
MaxPool2d-6 [-1, 25, 330, 1] 0
Expression-7 [-1, 25, 330, 1] 0
Dropout-8 [-1, 25, 330, 1] 0
Conv2d-9 [-1, 50, 321, 1] 12,500
BatchNorm2d-10 [-1, 50, 321, 1] 100
Expression-11 [-1, 50, 321, 1] 0
MaxPool2d-12 [-1, 50, 107, 1] 0
Expression-13 [-1, 50, 107, 1] 0
Dropout-14 [-1, 50, 107, 1] 0
Conv2d-15 [-1, 100, 98, 1] 50,000
BatchNorm2d-16 [-1, 100, 98, 1] 200
Expression-17 [-1, 100, 98, 1] 0
MaxPool2d-18 [-1, 100, 32, 1] 0
Expression-19 [-1, 100, 32, 1] 0
Dropout-20 [-1, 100, 32, 1] 0
Conv2d-21 [-1, 200, 23, 1] 200,000
BatchNorm2d-22 [-1, 200, 23, 1] 400
Expression-23 [-1, 200, 23, 1] 0
MaxPool2d-24 [-1, 200, 7, 1] 0
Expression-25 [-1, 200, 7, 1] 0
Conv2d-26 [-1, 4, 1, 1] 5,604
LogSoftmax-27 [-1, 4, 1, 1] 0
Expression-28 [-1, 4] 0
I implemented a keras version of the model with exactly the same dimensions. Only using channels last and average pooling.
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 22, 991, 25) 275
conv2d_1 (Conv2D) (None, 1, 991, 25) 13775
average_pooling2d (AveragePo (None, 1, 330, 25) 0
dropout (Dropout) (None, 1, 330, 25) 0
conv2d_2 (Conv2D) (None, 1, 321, 50) 12550
batch_normalization (BatchNo (None, 1, 321, 50) 200
average_pooling2d_1 (Average (None, 1, 107, 50) 0
dropout_1 (Dropout) (None, 1, 107, 50) 0
conv2d_3 (Conv2D) (None, 1, 98, 100) 50100
batch_normalization_1 (Batch (None, 1, 98, 100) 400
average_pooling2d_2 (Average (None, 1, 32, 100) 0
dropout_2 (Dropout) (None, 1, 32, 100) 0
conv2d_4 (Conv2D) (None, 1, 23, 200) 200200
batch_normalization_2 (Batch (None, 1, 23, 200) 800
average_pooling2d_3 (Average (None, 1, 7, 200) 0
dropout_3 (Dropout) (None, 1, 7, 200) 0
conv2d_5 (Conv2D) (None, 1, 1, 4) 5604
flatten (Flatten) (None, 4) 0
dense (Dense) (None, 4) 20
In keras I calculate the unit outputs for the pooling layers and the last convolution layer with the following code (these are the ends of the different blocks you mentioned in the paper):
out_tensor_1 = denseConvNet.get_layer(index = 2).output
out_tensor_2 = denseConvNet.get_layer(index = 6).output
out_tensor_3 =denseConvNet.get_layer(index = 10).output
out_tensor_4 = denseConvNet.get_layer(index = 14).output
out_tensor_pred = denseConvNet.get_layer(index = -3).output
trained = []
earlyPredictor_1 = tf.keras.Model(denseConvNet.input , out_tensor_1)
trained.append(earlyPredictor_1.predict(data_test))
earlyPredictor_2 = tf.keras.Model( denseConvNet.input , out_tensor_2)
trained.append(earlyPredictor_2.predict(data_test))
earlyPredictor_3 = tf.keras.Model( denseConvNet.input , out_tensor_3)
trained.append(earlyPredictor_3.predict(data_test))
earlyPredictor_4 = tf.keras.Model( denseConvNet.input , out_tensor_4)
trained.append(earlyPredictor_4.predict(data_test))
earlyPredictor_pred = tf.keras.Model( denseConvNet.input , out_tensor_pred)
trained.append(earlyPredictor_pred.predict(data_test))
Due to that my unit outputs have the same dimensions as the output shape as the layer (channels last):
Block 1 (24, 1, 330, 25)
Block 2 (24, 1, 107, 50)
Block 3 (24, 1, 32, 100)
Block 4 (24, 1, 7, 200)
Prediction Layer (24, 1, 1, 4)
The receptive field size and the resulting envelope shapes of the corresponding layers are as follow:
Receptive field size of layer 2/Block 1: 10
Mean squared envelope shape of layer 2/Block 1: (24, 22, 991, 1)
Receptive field size of layer 6/Block 2: 39
Mean squared envelope shape of layer 6/Block 2:(24, 22, 962, 1)
Receptive field size of layer 10/Block 3:126
Mean squared envelope shape of layer 10/Block 3:(24, 22, 875, 1)
Receptive field size of layer 14/Block 4: 387
Mean squared envelope shape of layer 14/Block 4: (24, 22, 614, 1)
Receptive field size of pred layer: 927
Mean squared envelope shape of pred layer: (24, 22, 74, 1)
Can you tell me how to understand the activations if they are not (simply) the output of one layer and how to compute them?
Hi, the main point that one needs to take into account is cropped decoding. See Figure 4 in https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.23730 or the explanation at https://braindecode.org/auto_examples/plot_bcic_iv_2a_moabb_cropped.html
For cropped decoding without padding, the model will get as many outputs as it has inputs - receptive field size + 1. Therefore one can use average pooling on the envelope with pooling size same as receptive field size and then get exactly matching outputs from the pooling and the output unit in the temporal dimension. And the average pooling will pool over exactly the same inputs that the output unit used to compute its output.
Concretely to have cropped decoding, one must appropriately replace the max pooling stride by dilations in the following layers.
Code from current braindecode does that here:
(it was manually done in this old repo, potentially more difficult to understand).
That may clear things up?
Hi, thanks for the clarification. That helped a lot to understand the problem since by now I only worked with trial wise training. You said one only must "replace the max pooling stride by dilations in the following layers" like it is done in the mentioned function. So is there no need to transform the data that is used as an input?
No there is no need to transform the input. In our paper we actually used [-500,+4000] ms for both trialwise and cropped decoding, which is 1125 timesteps @ 250 Hz. [0,4000] should give similar, if slightly worse results.
Filter to frequency bands:
https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L153-L162
Compute envelope
(absolute of hilbert transform): https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L171
Square Envelope
(
square_before_mean
wasTrue
in our setting) [Envelope was saved to a file and reloaded] https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L30-L31 ⚠️ Note there is possibly one mistake/discrepancy in the paper: We square before averaging (next step), not after ⚠️Compute Moving Average of the envelope within the receptive field of the corresponding layer
Basic steps:
Compute Correlation with Activations
For trained model https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L44-L45 and random model https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L47-L48
Compute Activations
https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L60 So Compute per-batch activations and then aggregate to per-trial activations in https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/veganlasagne/layer_util.py#L30-L54
Compute Correlation Envelope and Activations
https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L76 https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L59-L71
In the end these correlations for trained and untrained model will be saved: https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L52-L53
Now when you have these correlations for trained and untrained model you can average across units in a layer and then compute the difference of them (difference between trained and untrained model correlations). This is Figure 15 in https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.23730
As a comparison we also compute the correlations of the envelope with the class labels (no network involved!). This is in the rightmost plots in Figure 15, or class-resolved/per class in Figure 14.