imlixinyang / HiSD

Code for "Image-to-image Translation via Hierarchical Style Disentanglement" (CVPR 2021 Oral).
Other
392 stars 47 forks source link

question about paper #9

Closed diaodeyi closed 3 years ago

diaodeyi commented 3 years ago

Hi, thanks for your beautiful work, I want to konw the reason for the design about the m、f of the translator , Is there any reference work , or you design this by experiment? And as your paper mentioned "The attention mask in our translator is both spatial- wise and channel-wise." can you explain specifically ?

imlixinyang commented 3 years ago

The design to learn an unsupervised mask is not new in image-to-image translation area. This design always encourage the translation to focus on specific area of the image rather than the whole. You can find the works which also use this design in Sec.3.3 in the paper. After encoding, the sizes of the image feature is of cxhxw, where c is the channels. In our experiments, we find that using only spatial-wise mask (of sizes hxw) failed to help our disentanglement (see Sec.4.3 ablation study). Therefore, we add channel wise to the mask (of sizes cxhxw). We think this is because that the encoder has not encoded the image into a highly disentangled feature space (where the feature of each tag is separated into some specific channels of the feature). You can also change the code (of "Class Translator" in /core/networks.py) if you want to have a try in other settings.

diaodeyi commented 3 years ago

Thank you