Explanation Input-Feature Unit-Output Correlation Maps (Envelope Activation Correlation)

Filter to frequency bands:

https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L153-L162

Compute envelope

(absolute of hilbert transform): https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L171

Square Envelope

(square_before_mean was True in our setting) [Envelope was saved to a file and reloaded] https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L30-L31 ⚠️ Note there is possibly one mistake/discrepancy in the paper: We square before averaging (next step), not after ⚠️

Compute Moving Average of the envelope within the receptive field of the corresponding layer

Basic steps:

Determine receptive field size of the layer https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L37-L41
Average the envelopes within the receptive field using pooling https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L76 https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L95
Aggregate per-trial envelopes from the per-batch envelopes https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L80-L85

Compute Correlation with Activations

For trained model https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L44-L45 and random model https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L47-L48

Compute Activations

https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L60 So Compute per-batch activations and then aggregate to per-trial activations in https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/veganlasagne/layer_util.py#L30-L54

Compute Correlation Envelope and Activations

https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L76 https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/envelopes.py#L59-L71

In the end these correlations for trained and untrained model will be saved: https://github.com/robintibor/braindevel/blob/21f58aa74fdd2a3b03830c950b7ab14d44979045/braindecode/analysis/create_env_corrs.py#L52-L53

Now when you have these correlations for trained and untrained model you can average across units in a layer and then compute the difference of them (difference between trained and untrained model correlations). This is Figure 15 in https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.23730

As a comparison we also compute the correlations of the envelope with the class labels (no network involved!). This is in the rightmost plots in Figure 15, or class-resolved/per class in Figure 14.

How to interpret and calculate the activations? Different than layer output?

I couldn't follow your calculation of the activations to 100%. As I understood from the paper the activations of one layer are the unit outputs. So the dimension of the activations/unit outputs is the same as the output shape of that layer!?

To calculate the correlation between the mean squared envelopes and the activations, their dimensions have to be the same. And you said that they are exactly the same since the envelopes are calculated with the receptive field size of the corresponding layer.

Your deep model contains the following layers with output shapes:

    Layer (type)               Output Shape         Param #

    Expression-1          [-1, 1, 1000, 22]               0
        Conv2d-2          [-1, 25, 991, 22]             275
        Conv2d-3           [-1, 25, 991, 1]          13,750
   BatchNorm2d-4           [-1, 25, 991, 1]              50
    Expression-5           [-1, 25, 991, 1]               0
     MaxPool2d-6           [-1, 25, 330, 1]               0
    Expression-7           [-1, 25, 330, 1]               0
       Dropout-8           [-1, 25, 330, 1]               0
        Conv2d-9           [-1, 50, 321, 1]          12,500
  BatchNorm2d-10           [-1, 50, 321, 1]             100
   Expression-11           [-1, 50, 321, 1]               0
    MaxPool2d-12           [-1, 50, 107, 1]               0
   Expression-13           [-1, 50, 107, 1]               0
      Dropout-14           [-1, 50, 107, 1]               0
       Conv2d-15           [-1, 100, 98, 1]          50,000
  BatchNorm2d-16           [-1, 100, 98, 1]             200
   Expression-17           [-1, 100, 98, 1]               0
    MaxPool2d-18           [-1, 100, 32, 1]               0
   Expression-19           [-1, 100, 32, 1]               0
      Dropout-20           [-1, 100, 32, 1]               0
       Conv2d-21           [-1, 200, 23, 1]         200,000
  BatchNorm2d-22           [-1, 200, 23, 1]             400
   Expression-23           [-1, 200, 23, 1]               0
    MaxPool2d-24            [-1, 200, 7, 1]               0
   Expression-25            [-1, 200, 7, 1]               0
       Conv2d-26              [-1, 4, 1, 1]           5,604
   LogSoftmax-27              [-1, 4, 1, 1]               0
   Expression-28                    [-1, 4]               0

I implemented a keras version of the model with exactly the same dimensions. Only using channels last and average pooling.

       Layer (type)                 Output Shape              Param #

       conv2d (Conv2D)              (None, 22, 991, 25)       275
       conv2d_1 (Conv2D)            (None, 1, 991, 25)        13775
       average_pooling2d (AveragePo (None, 1, 330, 25)        0
       dropout (Dropout)            (None, 1, 330, 25)        0
       conv2d_2 (Conv2D)            (None, 1, 321, 50)        12550
       batch_normalization (BatchNo (None, 1, 321, 50)        200
       average_pooling2d_1 (Average (None, 1, 107, 50)        0
       dropout_1 (Dropout)          (None, 1, 107, 50)        0
       conv2d_3 (Conv2D)            (None, 1, 98, 100)        50100
       batch_normalization_1 (Batch (None, 1, 98, 100)        400
       average_pooling2d_2 (Average (None, 1, 32, 100)        0
       dropout_2 (Dropout)          (None, 1, 32, 100)        0
       conv2d_4 (Conv2D)            (None, 1, 23, 200)        200200
       batch_normalization_2 (Batch (None, 1, 23, 200)        800
       average_pooling2d_3 (Average (None, 1, 7, 200)         0
       dropout_3 (Dropout)          (None, 1, 7, 200)         0
       conv2d_5 (Conv2D)            (None, 1, 1, 4)           5604
       flatten (Flatten)            (None, 4)                 0
       dense (Dense)                (None, 4)                 20

In keras I calculate the unit outputs for the pooling layers and the last convolution layer with the following code (these are the ends of the different blocks you mentioned in the paper):

        out_tensor_1 = denseConvNet.get_layer(index = 2).output         
        out_tensor_2 = denseConvNet.get_layer(index = 6).output
        out_tensor_3 =denseConvNet.get_layer(index = 10).output
        out_tensor_4 = denseConvNet.get_layer(index = 14).output
        out_tensor_pred = denseConvNet.get_layer(index = -3).output        

        trained = []
        earlyPredictor_1 = tf.keras.Model(denseConvNet.input , out_tensor_1)
        trained.append(earlyPredictor_1.predict(data_test))
        earlyPredictor_2 = tf.keras.Model( denseConvNet.input  , out_tensor_2)
        trained.append(earlyPredictor_2.predict(data_test))
        earlyPredictor_3 = tf.keras.Model( denseConvNet.input  , out_tensor_3)
        trained.append(earlyPredictor_3.predict(data_test))
        earlyPredictor_4 = tf.keras.Model( denseConvNet.input  , out_tensor_4)
        trained.append(earlyPredictor_4.predict(data_test))
        earlyPredictor_pred = tf.keras.Model( denseConvNet.input , out_tensor_pred)
        trained.append(earlyPredictor_pred.predict(data_test))

Due to that my unit outputs have the same dimensions as the output shape as the layer (channels last):

        Block 1                   (24, 1, 330, 25)
        Block 2                   (24, 1, 107, 50)
        Block 3                   (24, 1, 32, 100)
        Block 4                   (24, 1, 7, 200)
        Prediction Layer          (24, 1, 1, 4)

The receptive field size and the resulting envelope shapes of the corresponding layers are as follow:

        Receptive field size of layer 2/Block 1: 10 
        Mean squared envelope shape of layer 2/Block 1: (24, 22, 991, 1)

        Receptive field size of layer 6/Block 2: 39 
        Mean squared envelope shape of layer 6/Block 2:(24, 22, 962, 1)

        Receptive field size of layer 10/Block 3:126 
        Mean squared envelope shape of layer 10/Block 3:(24, 22, 875, 1)

        Receptive field size of layer 14/Block 4: 387 
        Mean squared envelope shape of layer 14/Block 4: (24, 22, 614, 1)

        Receptive field size of pred layer: 927 
        Mean squared envelope shape of pred layer: (24, 22, 74, 1)

Can you tell me how to understand the activations if they are not (simply) the output of one layer and how to compute them?

Hi, the main point that one needs to take into account is cropped decoding. See Figure 4 in https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.23730 or the explanation at https://braindecode.org/auto_examples/plot_bcic_iv_2a_moabb_cropped.html

For cropped decoding without padding, the model will get as many outputs as it has inputs - receptive field size + 1. Therefore one can use average pooling on the envelope with pooling size same as receptive field size and then get exactly matching outputs from the pooling and the output unit in the temporal dimension. And the average pooling will pool over exactly the same inputs that the output unit used to compute its output.

Concretely to have cropped decoding, one must appropriately replace the max pooling stride by dilations in the following layers.

Code from current braindecode does that here:

https://github.com/braindecode/braindecode/blob/6fe57295e5a1372dfd0abe77c494d1352bd647eb/braindecode/models/util.py#L9-L52

(it was manually done in this old repo, potentially more difficult to understand).

That may clear things up?

Hi, thanks for the clarification. That helped a lot to understand the problem since by now I only worked with trial wise training. You said one only must "replace the max pooling stride by dilations in the following layers" like it is done in the mentioned function. So is there no need to transform the data that is used as an input?

No there is no need to transform the input. In our paper we actually used [-500,+4000] ms for both trialwise and cropped decoding, which is 1125 timesteps @ 250 Hz. [0,4000] should give similar, if slightly worse results.

robintibor / braindevel