med-air / Endo-FM

[MICCAI'23] Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
Apache License 2.0
146 stars 14 forks source link

Questions about downstream tasks #15

Open Yipinggggg opened 3 months ago

Yipinggggg commented 3 months ago

Hi, great work! But I have a question I don't understand.

The backbone you used for training is a timesformer which takes a sequence of frames as input, but for all the downstream tasks the input is a single frame. Maybe I haven't fully understood the code, but what does the time dimension do in downstream tasks?

Thank you very much!

Kyfafyd commented 3 months ago

Hi @Yipinggggg Thanks for your interest! All of our downstream tasks take video sequences as the model input to model the temporal information. May I learn which part of code is confusing?