ubc-vision / COTR

Code release for "COTR: Correspondence Transformer for Matching Across Images"(ICCV 2021)
Apache License 2.0
460 stars 58 forks source link

Possible redundancy in the code #33

Closed Wuziyi616 closed 2 years ago

Wuziyi616 commented 2 years ago

Hi, I notice that when constructing the Transformer, you always return the intermediate features at this line. However, after feeding them to MLP for corr regression, you only take the prediction over the last layer at this line. So I guess maybe you can set return_intermediate=False to save some memory/computation?

Wuziyi616 commented 2 years ago

I have another (maybe stupid) question. In your positional embedding here, the order of x/y seems to be reversed? If I concat them by xy_embed = torch.stack([x_embed, y_embed],dim=-1), and print their values:

>>> xy_embed.shape
torch.Size([2, 8, 8, 2])  # this is fine
>>> xy_embed[0, 0, 0]
tensor([1., 1.])  # this is also fine
>>> xy_embed[0, 0, 1]
tensor([2., 1.])  # shouldn't it be [1., 2.]?
>>> xy_embed[0, 5, 1]
tensor([2., 6.])  # shouldn't it be [6., 2.]?

I find their values a bit confusing. Any insight here?

jiangwei221 commented 2 years ago

Hey, yes, it sounds a good idea to set return_intermediate=False. I would appreciate it if you can create a PR. I'm not sure about the positional encoding question, will the permute function fix this ordering problem?

Wuziyi616 commented 2 years ago

For the positional encoding thing, I guess it's fine if we have it consistent during training and testing :)