Closed JHLew closed 2 years ago
Thank you for your interest in our work!
According to my understanding, softmax splatting is applied not only on the RGB images but also the pyramidal features extracted.
Correct.
According to my understanding, softmax splatting is applied not only on the RGB images but also the pyramidal features extracted.
Yes, if the extracted features would not be warped then the synthesis network would have to perform correspondence estimation (which it doesn't have to if we warp the features as well). And yes, the features are warped just like the images (just with downsampled flow at the lower resolutions, make sure to also scale the flow accordingly).
Seems like the how to get the Z metric for the RGB images is explained, but I could not find anything on the Z metric for the feature pyramid.
We compute the importance metric only at the highest level and then downsample it to obtain the importance metric for the lower levels. My apologies that the paper did not make this clear. You could also compute the importance metric separately for each layer, it has been a while since I tried this configuration but it performed similarly well if I remember correctly (I didn't try this configuration with fine-tuning the importance metric though).
Or, do you simply downsample the Z obtained from the RGB images?
Just a quick note, you can also try computing Z using the photometric consistency of the first layer of extracted features in the feature pyramid (sort of like a "feature" instead of "color" consistency). Or both, concatenate the colors with the corresponding features from the first pyramid level and then use that to obtain Z. If I remember correctly then the all worked similarly well.
Wow, thank you for the quick & detailed explanation! Everything got crystal clear!
Hi, I am currently trying to reproduce your entire model for comparison, but ran into an issue of which I could not find in the paper or the issues here.
According to my understanding, softmax splatting is applied not only on the RGB images but also the pyramidal features extracted. Seems like the how to get the Z metric for the RGB images is explained, but I could not find anything on the Z metric for the feature pyramid.
I assume that it is done exactly the same as for RGB images, just that the inputs are feature maps instead of RGB images? (UNet input as features / feature maps instead of I_0 and I_1 of Eq.14) Or, do you simply downsample the Z obtained from the RGB images?