Closed Huster-Hq closed 4 months ago
Yes.
I have a question about the detail of Object Memory:
Isn't $W$ generated by the memory feature $F$ through a MLP?
What do you mean by "constraint label"? W is directly constructed from M_l in the screenshot that you provided. There are no additional transformations. Those masks are just the masks in Figure 4 (and their inverse).
Figure 4 shows the $M_l$ rather than pooling masks $W$.
Oh, right. Sorry -- it slipped my mind. We have visualized them before at some point. IIRC those masks are rather diffuse and don't have very recognizable patterns. They are learned end-to-end.
Is the mask prediction single channel, i.e., H×W×1?