yangsenius / TransPose

PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.
https://github.com/yangsenius/TransPose/releases/download/paper/transpose.pdf
MIT License
361 stars 58 forks source link

Use different backbones #21

Closed EckoTan0804 closed 3 years ago

EckoTan0804 commented 3 years ago

Hello,

I want to replace ResNet in TransPose-R with different backbones. How should I do it correctly? Should I import the backbone and replace the following code with the forward pass of the new backbone?

https://github.com/yangsenius/TransPose/blob/dab9007b6f61c9c8dce04d61669a04922bbcd148/lib/models/transpose_r.py#L403-L410

Thanks in advance!

yangsenius commented 3 years ago

Yes, you are right. You also need to ensure that the downsampling ratio of position embedding and upsampling layers coordinate with the backbone. position embedding, upsampling layers

EckoTan0804 commented 3 years ago

Thanks for your reply!

Currently I am trying to apply a new backbone for TransPose-R. The new backbone adopts 32x downsampling ratio and outputs feature maps of size HxW=8x4 for 256x192 input. However, the position embedding of TransPose-R has an 8x downsampling ratio. Should I directly modify the positional embedding as

self.pe_h = h // 32
self.pe_w = w // 32

, or should I apply a 4x upsampling for the feature maps?

Btw position embedding, upsampling layers you mentioned above seem to refer to the same line of code.

yangsenius commented 3 years ago

The position embedding part is right. You should modify the config of deconvolution layers upsampling layers by changing the YAML: NUM_DECONV_LAYERS;NUM_DECONV_FILTERS;NUM_DECONV_KERNELS. 8x unsampling via 3 deconvolutioal layers are needed to recover the 1/32 input size to 1/4 size.

EckoTan0804 commented 3 years ago

The position embedding part is right. You should modify the config of deconvolution layers upsampling layers by changing the YAML: NUM_DECONV_LAYERS;NUM_DECONV_FILTERS;NUM_DECONV_KERNELS. 8x unsampling via 3 deconvolutioal layers are needed to recover the 1/32 input size to 1/4 size.

Thanks for your suggestion. Just to confirm, to apply my new backbone (which adopts 32x downsampling ratio) for TransPose-R, I need to

  1. Adjust the dimension of positional embedding:

    self.pe_h = h // 32
    self.pe_w = w // 32
  2. Modify the config of DECONV layers as

    NUM_DECONV_LAYERS: 3
    NUM_DECONV_FILTERS:
    - 256
    - 256
    - 256
    NUM_DECONV_KERNELS:
    - 4
    - 4
    - 4
  3. Replace the forward pass of ResNet backbone with my new backbone

If there's mistake, please correct me.

yangsenius commented 3 years ago

There is no problem.

EckoTan0804 commented 3 years ago

Thanks a lot!