Closed dalessioluca closed 4 years ago
I have realized that the model can be improved by making the generation of the background and foreground conditional dependent. It is unclear if the background should b e generated first and then the foreground or the other way around.
In the inference I have not the trick to rescale the bounding box probabilities based on (img_raw-bg) which clearly suggests that, in the inference, the background is inferred first and the foreground is conditional dependent.
I am pretty sure my trick can be removed by extending the model and making foreground and background conditional dependent
One idea to improve this is to take the mask, send it to a CNN, and use z what as AdaIN mu and std. The idea is to take the mask and massage it with content by changing the activation of conv filters using z_what One doesn't need to condition z_what on z_mask since the image is generated by modulating the mask, the necessary mask/content correlations will be learnt automatically Just a thought ;-) We justify conditioning the inference of z_what on z_mask because, well, you need both the image AND the mask to infer z_what, even if z_mask is indep of z_what
AdaIN is this paper: https://arxiv.org/pdf/1703.06868.pdf
Still issue is solved since in the new graphical model there is only one Z which is decoded to both image and mask
implement new graphical model where: