Question about the Pre-trained Encoder in ablation studies.

Thank you for reaching out and for your interest in our work. Regarding the term "Pre-trained Encoder" mentioned in Tab. 3, it indeed refers to the "Deformable Transformer Encoder" / "Image Encoder" specifically, which further incorporates pre-trained weights from panoptic segmentation.

Throughout the paper, when we mention the "Encoder," it consistently refers to this very "Deformable Transformer Encoder" / "Image Encoder".

I'd also like to share that we've just updated our paper on arXiv yesterday, significantly enhancing performance, revising conclusions, and aligning expressions for clarity. I highly recommend referring to the latest version for the most updated and comprehensive information.

Thank you once again for your interest and please feel free to reach out if you have any further inquiries or need additional clarification.

hustvl / Symphonies

Question about the Pre-trained Encoder in ablation studies. #7