Hi
Thank you very much for sharing the code for this amazing work.
I have some naïve questions regarding the choice of design for the reco loss.
1) is there any specific reason that you put Lines 130-136 under the torch.no_grad()?
I appreciate it if you can share the insight behind such a decision.
2) based on your experiments/experience how the performance will change if we change the temperature to a smaller (0.05) or larger (1-2) values?
3) what would be the strategy behind selecting the number for negatives samples and queries, e.g. you have 512 for negative and 256 for query, what will happen if we change those and if there is a rule that we should keep in mind while playing with those numbers.
4) when I am using fully supervised model, reco loss still improve the results, my question is: is it necessary to have the projection layer in the fully supervised setting? can you explain the reason in either cases please. To be specific, I was not sure why we still need to use self.representation when we are doing the fully supervised case and why not use the features before we give them to the classifier for that (i.e. x = self.resnet_layer4(x))
Thanks a lot for your insight and explanation in advance :)
Hi Thank you very much for sharing the code for this amazing work.
I have some naïve questions regarding the choice of design for the reco loss. 1) is there any specific reason that you put Lines 130-136 under the
torch.no_grad()
? I appreciate it if you can share the insight behind such a decision. 2) based on your experiments/experience how the performance will change if we change the temperature to a smaller (0.05) or larger (1-2) values? 3) what would be the strategy behind selecting the number for negatives samples and queries, e.g. you have 512 for negative and 256 for query, what will happen if we change those and if there is a rule that we should keep in mind while playing with those numbers.4) when I am using fully supervised model, reco loss still improve the results, my question is: is it necessary to have the projection layer in the fully supervised setting? can you explain the reason in either cases please. To be specific, I was not sure why we still need to use
self.representation
when we are doing the fully supervised case and why not use the features before we give them to the classifier for that (i.e.x = self.resnet_layer4(x)
)Thanks a lot for your insight and explanation in advance :)