researchmm / PEN-Net-for-Inpainting

[CVPR'2019] PEN-Net: Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting
https://arxiv.org/abs/1904.07475
MIT License
361 stars 77 forks source link

Confuse about multi-decoder feature size #1

Closed PaTricksStar closed 5 years ago

PaTricksStar commented 5 years ago

Here equ(5) is rewrited as phi(L-1) = g(psi(L-1) + g(Phi(L))) where "+"represents channel concat operation . So phi(L-1) is 4x upsampling of the smallest feature size since there is two 'g' which represent transposed conv. In the supplement file , L=7 and there is six stride 2 convs. SO phi6 is (h/16, w/16)? If so ,phi2 is already (h,w) which seems wrong. Forgive me for my poor description Orz

zengyh1900 commented 5 years ago

Hi @PaTricksStar I'm really sorry for my late response. As you mentioned, when L=7, there are six downsample operations (with stride=2) the size of phi6 should be [h/(25), w/(25)] = [h/32, w/32]

Finally, thank you for your interest :smile:

PaTricksStar commented 5 years ago

If Phi7 is h/64 and phi6 = g(psi6 + g(Phi7)), then phi6 is h/16 since there is two g. Maybe I miss some details? Check it later.

zengyh1900 commented 5 years ago

If Phi7 is h/64 and phi6 = g(psi6 + g(Phi7)), then phi6 is h/16 since there is two g. Maybe I miss some details? Check it later.

Oh, I see, you're right. Thank you for your reminder!

Eq.5 is confusing, In order to keep the same with Table8 in SUPP, I will update Eq.5 as below, image

Is this expression clearer now?

Thanks,

PaTricksStar commented 5 years ago

Yes, that seems reasonable now.