skhu101 / SHERF

Code for our ICCV'2023 paper "SHERF: Generalizable Human NeRF from a Single Image"
Other
302 stars 10 forks source link

Positional Encoding for RGB #4

Closed greatbaozi001 closed 1 year ago

greatbaozi001 commented 1 year ago

Hi. I have read the paper and there is a question remains for me. As the paper mentions, a 2D encoder is adopted to extract feature map f ∈ R^(64x256x256), and positional encoding is performed to the RGB values and the code is append to 2D feature maps to form f ∈ R^(96x256x256). How can I map RGB ∈ 3 to (96-64) with positional encoding?

skhu101 commented 1 year ago

Hi, thanks for your interest in our work. We use use positional encoding with the number of frequencies 5 to map RGB ∈ R^(3x256x256) to R^(33x256x256). Then we append the first 32 dimensions of RGB ∈ R^(32x256x256) to feature map f ∈ ∈ R^(64x256x256), which finally forms f ∈ R^(96x256x256).

greatbaozi001 commented 1 year ago

thanks, the answer is clear!

markkim1115 commented 1 year ago

Hi, Sorry for the reopening the issue. Is there any reason of design to pick the first 32 dimension of encoded RGB? Thanks!

skhu101 commented 1 year ago

Hi, we mainly hope to keep the dimension of 1D global, 2d pixel-aligned and 3d point features same so that it would be easier for later feature processing and fusion stage.