why the speed slower than pvtv2-b1?

sail-sg / poolformer

PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)

https://arxiv.org/abs/2111.11418

Apache License 2.0

1.3k stars 116 forks source link

why the speed slower than pvtv2-b1? #18

Closed lucasjinreal closed 2 years ago

lucasjinreal commented 2 years ago

Recently I trained a transformer based instance seg model, tested with different backbone, here is the result and speed test:

batchsize is training batchsize. Why the speed of poolformer is the slowest one? is that normal？

Slower than pvtv2-b1 and precision less than it...

yuweihao commented 2 years ago

Hi @jinfagang , PoolFormer here is just a tool to demonstrate MetaFormer. The implementation may not be efficient for industrial use. For example, nn.AvgPool2d may not be optimized well in CUDA. It can be replaced with DW Conv self.token_mixer = nn.Conv2d(in_channels=dim, out_channels=dim, kernel_size=3, stride=1, padding=1, groups=dim) to speed up. For GroupNorm, I still don't know how to speed up it currently.

lucasjinreal commented 2 years ago

@yuweihao Hi, does that means I should train whole model again if change nn.AvgPool2d to DW conv?

chuong98 commented 2 years ago

@jinfagang You don;t have to. In our experiment, replacing GN with BN, and then reimplement the Poollayer with a fixed, predefined DepthWise conv, gives us about 30% speed up, and the accuracy drop 1% on ImageNet. If use BN, you can fuse the Conv-BN to speed it up.

lucasjinreal commented 2 years ago

@chuong98 can you show your pretrained fixed DepthWise conv? how to reset it's weights?

yuweihao commented 2 years ago

Hi, @jinfagang , @chuong98 , I just found it seems that CUDA much prefers NHWC rather than NCHW [1]. However, NCHW is used by default in PyTorch and PoolFormer also uses this layout. This may also be optimized to further speed it up [2].

The figure is from [1].

[1] https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html#tensor-layout [2] https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html