Hi, thanks for releasing the code.
Looking at the diagram and the code implementation, I believe we can merge the Post-Linear Projection layer from a previous Focal-Block into the Pre-Linear layer of the next FocalBlock, since they are both Matrix multiplication without the activation in between. This will save parameters and inference time.
However, I am not sure the effect if we drop the Post-Linear layer during training.
Looking for your opinion, Thanks.
Hi, thanks for releasing the code. Looking at the diagram and the code implementation, I believe we can merge the Post-Linear Projection layer from a previous Focal-Block into the Pre-Linear layer of the next FocalBlock, since they are both Matrix multiplication without the activation in between. This will save parameters and inference time. However, I am not sure the effect if we drop the Post-Linear layer during training. Looking for your opinion, Thanks.