showlab / DatasetDM

[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
https://weijiawu.github.io/DatasetDM_page/
299 stars 12 forks source link

Pixel decoder design #2

Closed Pang-Yatian closed 1 year ago

Pang-Yatian commented 1 year ago

Dear author,

Thanks for your work.

I have a little confusion about the design of the pixel decoder. “the pixel decoder consists of several straightforward up-sampling layers. Each layer comprises four types of computations 1)1x1 conv 2)up sample 3)concat 4)mixed conv”

I want to confirm after upsampling to the spatial size of 64x64 and fusing the 64x64 feature from the unet, how to get the per-pixel embedding say 512x512? Just do three more layers using 1) 2) and 4) without 3)?

Looking forward to your reply.

weijiawu commented 1 year ago

Thank you for your concern about our work.

After obtaining the 64-resolution results, we directly perform linear interpolation to achieve a resolution of 512.

Pang-Yatian commented 1 year ago

I get it. Thank you!