Closed joonsu0109gh closed 1 year ago
Thanks for your interest in our research! Your question delves into a crucial aspect, and I would like to provide the following elaboration from two perspectives.
Methodology Perspective: In terms of methodology, our primary objective is to encapsulate the semantics of "instances". Despite this aim, our implementation involves modeling the corresponding "regional" features through deformable attention. As you correctly pointed out, there does exist a discernible gap between our initial motivation and the subsequent implementation. However, it is noteworthy that, despite this disparity, a significant degree of alignment is maintained. This cohesion can be attributed to the substantial number of queries employed and the enhanced instance-awareness facilitated by our pre-trained encoder. We acknowledge that this choice is a compromise arising from the absence of instance-level annotation.
Experimental Perspective: Examining our experiments, we indirectly substantiate the claimed "instance" attributes through superior IoU scores for instance classes. Additionally, detailed visualization results for specific instance classes, such as cars and trunks, serve to provide further evidence. As an ongoing effort, we are actively engaged in visualizing the intermediate representations for queries, which should be more straightforward to your inquiry. We are committed to releasing these updated contents at due course.
Your inquiry highlights an insightful aspect of our work, and we appreciate your diligence in seeking clarification. If you have any further questions or if there's anything else you would like to explore, please feel free to reach out. Thank you for your engagement!
I am deeply grateful for your kind and detailed explanations!
As you said and especially seeing the 'Symphonies w/o Instance queries' visualization in the supplementary section of your paper, it really seems that instance queries contribute significantly to the aspect of scene completion.
Your research has been a great source of inspiration, and I sincerely support your work. Thank you!
I am interested in your research and have a question.
The paper mentions using instance queries to capture global features, which I agree could be anticipated.
However, I am curious why they are named 'instance' queries when relying solely on learnable embeddings, without a specific loss measurement or proposals based on instances.
Is there any observation of instances being captured, or have I missed something?
Please advise. Thank you :)