microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
https://arxiv.org/abs/2103.14030
MIT License
13.96k stars 2.06k forks source link

aren't patch merging and patch embedding doing the same thing? #330

Open jerrywn121 opened 1 year ago

jerrywn121 commented 1 year ago

aren't patch merging and patch embedding doing the same thing? why do we implement patch merging in another way when we can simply use a kernel of size 2 with stride 2 to produce the output?