Hello, there:
I have noticed that all biformer models take nchw inputs, including the old_legacy version. Can I ask you why you don't adopt the PatchEmbeding method? And what do you think about PatchEmbeding? WIll transformers preform better without PatchEmbeding? Do you have some conclusion? thx...
Hello, there: I have noticed that all biformer models take nchw inputs, including the old_legacy version. Can I ask you why you don't adopt the PatchEmbeding method? And what do you think about PatchEmbeding? WIll transformers preform better without PatchEmbeding? Do you have some conclusion? thx...