Closed stella-von closed 1 year ago
The short answer is, I do not know. I did not do this ablation experiment after the architecture is finalized.
LCE is added to our architecture during our early exploration period. If I remember correctly, At that time, we used DWConv in stages 1 & 2, BRA in stages 3 & 4, the improvement brought by LCE is 0.2 or so. I guess it plays the role of position encoding.
Thanks.
May I ask how much improvement the LCE used in the article has brought?