yinboc / liif

Learning Continuous Image Representation with Local Implicit Image Function, in CVPR 2021 (Oral)
https://yinboc.github.io/liif/
BSD 3-Clause "New" or "Revised" License
1.28k stars 145 forks source link

Questions about coordinate conversion #25

Closed peiyaoooo closed 3 years ago

peiyaoooo commented 3 years ago

Hi Yinbo, Thank you for your impressive work. I'm confused about the coordinate conversion in https://github.com/yinboc/liif/blob/main/models/liif.py#L81 when you use them for the feature grid-sampling. The coord here denotes the normalized index of the hr images. And the q_coord seems to be the interpolated real HR index on basis of the real feature map index. I guess these are for the assumption that the pixel locates on the grid center. The following line is "rel_coord = coord -q_coord". What's the meaning of this rel_coord? I couldn't understand these conversions. And Later you multiply the rel_coord with the feature map scale for prediction. Is the range of the rel_coord not [-1,1]? Hope for your reply and thank you for your attention again.

yinboc commented 3 years ago

Thanks for your interst in our work!

It is the Equation (2) in the paper, "coord" are the coordinates of HR pixel centers (x_q), "q_coord" are the coordinates of the neatest feature vector (v*), thus "rel_coord" is the input relative coordinate for the decoding function. Multiplying the "rel_coord" with the feature map scale is to normalize the "rel_coord" into the range [-1, 1], since before this line the global range is [-1, 1] but we want to have a local range of [-1, 1] now.

danxuhk commented 3 years ago

Thanks for your interst in our work!

It is the Equation (2) in the paper, "coord" are the coordinates of HR pixel centers (x_q), "q_coord" are the coordinates of the neatest feature vector (v*), thus "rel_coord" is the input relative coordinate for the decoding function. Multiplying the "rel_coord" with the feature map scale is to normalize the "rel_coord" into the range [-1, 1], since before this line the global range is [-1, 1] but we want to have a local range of [-1, 1] now.

Why we should use a relative coordinate? Why not just use an absolute coordinate (i.e. x_q) to query? It is a bit not clear about the motivation. Could you explain it a bit? @yinboc