med-air / Endo-FM

[MICCAI'23] Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
Apache License 2.0
154 stars 15 forks source link

Question About Endo-FM and SVT Architecture Differences #19

Closed puxuntu closed 3 months ago

puxuntu commented 3 months ago

Hi,

Really great work and I have learned a lot from it. I am planning to cite both your Endo-FM paper and the Self-supervised Video Transformer (SVT) work published in CVPR 2022. However, I am unsure about the architectural differences between Endo-FM and SVT. From my observation, they seem quite similar, and I couldn't find a discussion about their differences in your paper. Could you please clarify any architectural differences between the two architectures so I can accurately explain it in my work? Thank you for your help!

Kyfafyd commented 3 months ago

Hi, @puxuntu thanks for your interest! SVT is designed for tackling general natural videos, while our Endo-FM is tailored for endoscopy scenarios. Endo-FM aims to address various lesions, tissues, and so on in the endoscopy stream. Although both these two methods are built upon DINO with similar architecture, they are solving absolutely different problems.

puxuntu commented 3 months ago

@Kyfafyd Thanks for your response!