Closed LMMMEng closed 1 year ago
It seems that two 3x3 convs work better, according to uniformer’s choice:
But they do not have a strict ablation study (i.e. replacing only this patch embedding part) on this. I did not try, either, just followed their routines and focused on the attention part.
On 8 Apr 2023, at 12:58 PM, LMMMEng @.***> wrote:
Thank you for your wonderful work!
It has been noted that some recent works used two 3x3 convs (stride=2) instead of one 7x7 conv (stride=4) as stem, is it because the latter can lead to better results?
— Reply to this email directly, view it on GitHubhttps://github.com/rayleizhu/BiFormer/issues/4, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEYCTO3YTCY63YX7J43KK4TXADWANANCNFSM6AAAAAAWXGRUEI. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Got it, thank you!
Thank you for your wonderful work!
It has been noted that some recent works used two 3x3 convs (stride=2) instead of one 7x7 conv (stride=4) as stem, is it because the latter can lead to better results?