sniklaus / softmax-splatting

an implementation of softmax splatting for differentiable forward warping using PyTorch
469 stars 58 forks source link

Importance metric #17

Closed hhhhhumengshun closed 4 years ago

hhhhhumengshun commented 4 years ago

can you share the network about Importance metric Z ? Thanks

sniklaus commented 4 years ago
  1. It is a U-Net with 4 levels and 16, 32, 64, 96 channels per level.
  2. To feed the U-Net, we use a conv to convert I_0 from 3 channels to 12 channels and another conv to convert the photometric consistency (from equation 15) from 1 channel to 4 channels. Together, these 12 + 4 channels form the first level of features.
  3. The encoder of the U-Net uses relu-sconv-relu-conv blocks between each level (where sconv is a strided conv), the decoder uses upsample-relu-conv-relu-conv blocks between each level. The levels of the encoder and decoder are connected with relu-conv-relu-conv blocks with skip connections.
  4. Finally, we use a relu-conv-relu block with a skip connection to convert the 16 channels (processed by the U-Net) back to 1 channel as an importance metric.

I would advice against spending too much time on fine-tuning the importance metric though. It does, in my opinion, not justify the gains. The reason why we did it is to show that all parts of our proposed softmax splatting operator are differentiable and can be supervised end-to-end.

chLFF commented 3 years ago

Several questions about the details...

  1. How do you combine the features in the process of upsampling in U-Net, concatenate or sum? If concatenate, how do you get the 16-channels features before the output block?
  2. What kernel_size do you use?
  3. The implementation of output block. Dose the skip connection of output block have conv to reduce channels? If not, how do the channels of skipped features become one?

Thank you very much!

sniklaus commented 3 years ago

Thank you for your interest in our work!

  1. Addition/summation, just like in the GridNet architecture of the synthesis network. In fact, my implementation for the U-Net calls the same helper functions and creates a GridNet with one encoder and one decoder column (and three rows).
  2. The kernel size is 3 across the board.
  3. Yes, there is one conv-relu-conv output block that goes from 16 channels to 1 channels.