med-air / Endo-FM

[MICCAI'23] Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
Apache License 2.0
146 stars 14 forks source link

Issues about attention type. #1

Closed 16rq closed 8 months ago

16rq commented 10 months ago

Thank you for your work. It is great!

I have some questions when I try to run your code. That is, what is the difference between 'only_space' and 'time_space_joint' attention? They are the same as each other in the code.

Kyfafyd commented 10 months ago

Thanks for your interests! 'space_only' only performs spatial attention, while 'time_space_joint' performs both spatial and temporal attentions, as you can see in the following: https://github.com/med-air/Endo-FM/blob/1b33496d7219e95f3f617a967616b808acaa2a71/models/timesformer.py#L233-L235 https://github.com/med-air/Endo-FM/blob/1b33496d7219e95f3f617a967616b808acaa2a71/models/timesformer.py#L312-L327