ysj9909 / SHViT

[CVPR 2024] SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Other
26 stars 0 forks source link

Utilizing Model for downstream Tasks #2

Closed altair199797 closed 1 week ago

altair199797 commented 1 month ago

Dear Seokju Yun and Youngmin Ro,

I am trying to use your model in downstream tasks, unfortunately when I simply use your architecture (and loading your checkpoint) and plug it in something like a RetinaNet, it does not learn anything, while other models do.

I see that you apply several changes to use your architecture in RetinaNet (including a different FPN), but what portion of that is neccessary, when I want to use a normal FPN? To be honest I don't really understand your code in SHViT/downstream/shvit.py.

Basically my question is: Were you unable to train SHViT in RetinaNet without these modifications, or am I just stupid? ;)

Best Regards Moritz Nottebaum

ysj9909 commented 1 month ago

Thank you for your interest in our research! The SHViT/downstream/shvit.py file primarily calculates the multi-scale features that serve as inputs for the FPN. Our macro design utilizes a 3-stage structure rather than the conventional 4-stage structure. This difference is accounted for in the SHViT/downstream/efficientvit_fpn.py file. Therefore, to use SHViT as the backbone for dense prediction tasks, you need to modify the FPN code to accommodate the stage configuration.

altair199797 commented 1 month ago

Thank you for your fast answer! That was not the problem, but thank you for clarification! I'll just try again.