snap-research / EfficientFormer

EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
https://arxiv.org/abs/2212.08059
Other
982 stars 93 forks source link

How to apply MetaBlock3D to detection framework? #13

Closed lucasjinreal closed 2 years ago

lucasjinreal commented 2 years ago

HI, if I look it right, the 4 stages output shape would be something like this:

stage3 torch.Size([1, 448, 16, 16])
stage4 torch.Size([1, 448, 16, 16])
stage5 torch.Size([256, 1, 448])

now that the stage5 is 3 dimension, how should I using these output features to a FPN-like detection architecture?

liyy201912 commented 2 years ago

Thanks for your interest in this work. Reshaping the feature of the 3D stage from [B, N, C] back to [B, C, H, W] format should work. The detection and segmentation codes are released, please refer to the implementations.