Pixel decoder design - Githubissues

Pang-Yatian commented 1 year ago

Dear author,

Thanks for your work.

I have a little confusion about the design of the pixel decoder. “the pixel decoder consists of several straightforward up-sampling layers. Each layer comprises four types of computations 1）1x1 conv 2）up sample 3）concat 4）mixed conv”

I want to confirm after upsampling to the spatial size of 64x64 and fusing the 64x64 feature from the unet, how to get the per-pixel embedding say 512x512? Just do three more layers using 1) 2) and 4) without 3)?

Looking forward to your reply.

weijiawu commented 1 year ago

Thank you for your concern about our work.

After obtaining the 64-resolution results, we directly perform linear interpolation to achieve a resolution of 512.

Pang-Yatian commented 1 year ago

I get it. Thank you!

showlab / DatasetDM

Pixel decoder design #2