Closed altair199797 closed 1 week ago
Thank you for your interest in our research! The SHViT/downstream/shvit.py
file primarily calculates the multi-scale features that serve as inputs for the FPN. Our macro design utilizes a 3-stage structure rather than the conventional 4-stage structure. This difference is accounted for in the SHViT/downstream/efficientvit_fpn.py
file. Therefore, to use SHViT as the backbone for dense prediction tasks, you need to modify the FPN code to accommodate the stage configuration.
Thank you for your fast answer! That was not the problem, but thank you for clarification! I'll just try again.
Dear Seokju Yun and Youngmin Ro,
I am trying to use your model in downstream tasks, unfortunately when I simply use your architecture (and loading your checkpoint) and plug it in something like a RetinaNet, it does not learn anything, while other models do.
I see that you apply several changes to use your architecture in RetinaNet (including a different FPN), but what portion of that is neccessary, when I want to use a normal FPN? To be honest I don't really understand your code in
SHViT/downstream/shvit.py
.Basically my question is: Were you unable to train SHViT in RetinaNet without these modifications, or am I just stupid? ;)
Best Regards Moritz Nottebaum