Closed puxuntu closed 3 months ago
Hi, @puxuntu thanks for your interest! SVT is designed for tackling general natural videos, while our Endo-FM is tailored for endoscopy scenarios. Endo-FM aims to address various lesions, tissues, and so on in the endoscopy stream. Although both these two methods are built upon DINO with similar architecture, they are solving absolutely different problems.
@Kyfafyd Thanks for your response!
Hi,
Really great work and I have learned a lot from it. I am planning to cite both your Endo-FM paper and the Self-supervised Video Transformer (SVT) work published in CVPR 2022. However, I am unsure about the architectural differences between Endo-FM and SVT. From my observation, they seem quite similar, and I couldn't find a discussion about their differences in your paper. Could you please clarify any architectural differences between the two architectures so I can accurately explain it in my work? Thank you for your help!