Closed greatbaozi001 closed 1 year ago
Hi, thanks for your interest in our work. We use use positional encoding with the number of frequencies 5 to map RGB ∈ R^(3x256x256) to R^(33x256x256). Then we append the first 32 dimensions of RGB ∈ R^(32x256x256) to feature map f ∈ ∈ R^(64x256x256), which finally forms f ∈ R^(96x256x256).
thanks, the answer is clear!
Hi, Sorry for the reopening the issue. Is there any reason of design to pick the first 32 dimension of encoded RGB? Thanks!
Hi, we mainly hope to keep the dimension of 1D global, 2d pixel-aligned and 3d point features same so that it would be easier for later feature processing and fusion stage.
Hi. I have read the paper and there is a question remains for me. As the paper mentions, a 2D encoder is adopted to extract feature map f ∈ R^(64x256x256), and positional encoding is performed to the RGB values and the code is append to 2D feature maps to form f ∈ R^(96x256x256). How can I map RGB ∈ 3 to (96-64) with positional encoding?