microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.66k stars 225 forks source link

using TinyVit_5m_224 for backbone to train segmentation task #117

Closed haoxurt closed 2 years ago

haoxurt commented 2 years ago

Hi, thanks for sharing your excellent work. I want to try to use TinyVit_5m_224 for backbone to train segmentation task which input size is 512x512. Need I changed the original weight because of different size? How can I do it ?

wkcn commented 2 years ago

Thanks for your attention to our work!

TinyViT supports arbitrary input size, since the feature map will be padded if the attention window does not cover an entire window. You do not need to change the original weight.

Padding the feature map:

# https://github.com/microsoft/Cream/blob/main/TinyViT/models/tiny_vit.py#L346
x = x.view(B, H, W, C)
pad_b = (self.window_size - H %
         self.window_size) % self.window_size
pad_r = (self.window_size - W %
         self.window_size) % self.window_size
padding = pad_b > 0 or pad_r > 0

if padding:
    x = F.pad(x, (0, 0, 0, pad_r, 0, pad_b))

However, the padding operation may affect the performance of dense prediction. You can change the window size to avoid the padding operation under 512x512 resolution for better performance.

For example, change the window sizes to [ 16, 16, 32, 16 ] like this config . The weight attention_biases will be resized when calling the function utils.load_pretrained.

The weight attention_biases is resized.

#https://github.com/microsoft/Cream/blob/main/TinyViT/utils.py#L136
relative_position_bias_table_pretrained_resized = torch.nn.functional.interpolate(
    relative_position_bias_table_pretrained.view(1, nH1, S1, S1), size=(S2, S2),
    mode='bicubic')
haoxurt commented 2 years ago

Thanks your quick reply. According to your reply, I need only change the window sizes for better performance, and don't need change original weights. The weight attention_biases will be resized automatically in the function utils.load_pretrained . Is it?

wkcn commented 2 years ago

Thanks your quick reply. According to your reply, I need only change the window sizes for better performance, and don't need change original weights. The weight attention_biases will be resized automatically in the function utils.load_pretrained . Is it?

Yes : )

haoxurt commented 2 years ago

Thanks very much!

HaoWuSR commented 1 year ago

Thanks very much!

Hi, could you please share the model for segmentation? It is grateful if you could help me to reproduce the network!

wkcn commented 1 year ago

Thanks very much!

Hi, could you please share the model for segmentation? It is grateful if you could help me to reproduce the network!

Hi @HaoWuSR , sorry that we did try the model on the segmentation task.