Closed iumyx2612 closed 1 year ago
Hi @iumyx2612,
Thank you for reaching out again, and sorry for the late reply. I've been extremely busy since I recently relocated to another country.
Regarding the question you raised, I would say you are probably correct, even though it is not that straightforward.
Hi @iumyx2612,
Thank you for reaching out again, and sorry for the late reply. I've been extremely busy since I recently relocated to another country.
Regarding the question you raised, I would say you are probably correct, even though it is not that straightforward.
Thank you!!
In the paper authors stated that: "MSAs are low-pass filters, but Convs are high-pass filters". And authors proposed how to harmonize Convs with MSAs: by replacing Convs at (preferable) the end of a stage. And authors also have the idea that: "uses Convs in early stages and MSAs in late stages".
Sorry in advance if these following questions of mine are dumb.
In the late stage, adding Convs after MSAs should decreases the performance of a model right? Since the late stages produces low-frequency features, and adding Convs there suppress those features? I did an experiments: I trained a hierarchical ViT, Segformer, then replace the last stage 1x1 Conv in the decoder with a 3x3 Conv (pic below)
I trained the model on a Polyp Segmentation dataset, reported results below:
I haven't test if replacing the 1x1 Conv in stage 1-2 with 3x3 Conv will increases the performance, but is the conclusion I made above correct?