why fix all parameters except self attention parameters?

zhmiao / OpenLongTailRecognition-OLTR

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

BSD 3-Clause "New" or "Revised" License

839 stars 128 forks source link

why fix all parameters except self attention parameters? #57

Closed RainbowShirlley closed 3 years ago

RainbowShirlley commented 4 years ago

Hi, dear author: I'm wondering why in stage2, you fix all parameters except self attention parameters? So you mean for feat model, we only use stage1 to learn the features, and in stage2, you only learn about self attention? Can we learn all the parameters in stage 2 instead? Thanks very much for your explanation!

zhmiao commented 4 years ago

Hello @RainbowShirlley , thanks for asking and sorry for the late reply. The reason why we did not fix self attention is that this module is not trained in the first stage. It is totally random in the beginning of the second stage. We will have to train it during the second stage. Does that make sense?

valencebond commented 4 years ago

hi @RainbowShirlley @zhmiao , according to stage_2_meta_embedding.py, feature model params are not fixed, just learning with a samll learning rate.

zhmiao commented 4 years ago

@RainbowShirlley Yes, but for Places experiments, the backbone features are fixed in the second stage, because of the limitation of computational resource.

zhmiao commented 3 years ago

As it has been over two months, I will close this issue for now. If you have any more questions, you are welcome to reopen it. Thanks.