some questions about paper

We do not strictly implement foreground objects generation (i.e. 3d-bbox -> perspective objects) in fact (you may notice that we don't have any object generation metrics, e.g. IoU). Technically, generating foreground object violates our assumption that background are far enough w.r.t cameras location discrepancy. With that assumption, homography estimation can be applied to ensure background generation consistency and that's what we highlighted in our paper. Comparing with that, foreground objects are much closer to camera which is beyond our work (and we did not design a method for that). In short, foreground objects can NOT be tackled in a 2D space case where our method are working on. In our main paper Sec 3.2, we explained "Assuming that instance-level masks can be obtained at each view with either existing methods or simple retrieval", moreover, we explained that with more details in supplementary material Sec 5 (it should be available, let me know if it's not). It's a feasible solution, and notably, our foreground instance control are exactly based on that assumption: we can get instance mask in perspective view and not depending on how we get that mask. That's why we directly segment objects' mask from ground-truth image.
No. An ideal case of using these generative methods is, i. plausible annotations are available (from some generation method), ii. generate images along with annotations for downstream task.
Instead, it can precisely control a specific instance. In supplementary material, we explained our method in detail. Check up our impl code (tips: only class AttnProcessor is used. the other 2 functions are deprecated) and demo notebook.

kkaiwwana / MVPbev

some questions about paper #3