Closed pkqbajng closed 11 months ago
Thank you for reaching out and for your interest in our work. Regarding the term "Pre-trained Encoder" mentioned in Tab. 3, it indeed refers to the "Deformable Transformer Encoder" / "Image Encoder" specifically, which further incorporates pre-trained weights from panoptic segmentation.
Throughout the paper, when we mention the "Encoder," it consistently refers to this very "Deformable Transformer Encoder" / "Image Encoder".
I'd also like to share that we've just updated our paper on arXiv yesterday, significantly enhancing performance, revising conclusions, and aligning expressions for clarity. I highly recommend referring to the latest version for the most updated and comprehensive information.
Thank you once again for your interest and please feel free to reach out if you have any further inquiries or need additional clarification.
Hi, thanks for your excellent job. I'm confused about the pretrained encoder mentioned in the ablation study of table 3. Does it refer to the transformer encoder specificlly?