Merge Prelinear and Post-Linear layer?

Hi, thanks for releasing the code. Looking at the diagram and the code implementation, I believe we can merge the Post-Linear Projection layer from a previous Focal-Block into the Pre-Linear layer of the next FocalBlock, since they are both Matrix multiplication without the activation in between. This will save parameters and inference time. However, I am not sure the effect if we drop the Post-Linear layer during training. Looking for your opinion, Thanks.

microsoft / FocalNet

Merge Prelinear and Post-Linear layer? #5