hustvl / Symphonies

[CVPR 2024] Symphonies (Scene-from-Insts): Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
https://arxiv.org/abs/2306.15670
MIT License
158 stars 7 forks source link

Question about the Pre-trained Encoder in ablation studies. #7

Closed pkqbajng closed 10 months ago

pkqbajng commented 10 months ago

Hi, thanks for your excellent job. I'm confused about the pretrained encoder mentioned in the ablation study of table 3. Does it refer to the transformer encoder specificlly?

npurson commented 10 months ago

Thank you for reaching out and for your interest in our work. Regarding the term "Pre-trained Encoder" mentioned in Tab. 3, it indeed refers to the "Deformable Transformer Encoder" / "Image Encoder" specifically, which further incorporates pre-trained weights from panoptic segmentation.

Throughout the paper, when we mention the "Encoder," it consistently refers to this very "Deformable Transformer Encoder" / "Image Encoder".

I'd also like to share that we've just updated our paper on arXiv yesterday, significantly enhancing performance, revising conclusions, and aligning expressions for clarity. I highly recommend referring to the latest version for the most updated and comprehensive information.

Thank you once again for your interest and please feel free to reach out if you have any further inquiries or need additional clarification.