Open danielchyeh opened 3 years ago
Yes, I think you are right.
Thank you!
Btw, I tried using the lr and other parameters you provided to run baseline (256 batchsize) for 200 epochs and i could reach 69.7% on ImageNet-1K data. I'd like to try fixed lr (no decay) to see if higher performance could be reached.
The following Table shows benchmark... Referred to Simsiam paper(https://arxiv.org/pdf/2011.10566.pdf)
Great! Could you update the results after you finished. Thanks.
Hi, @danielchyeh
Have you tried fixed predictor lr? If you did, could you please share your results?
Thank you for the implementation! Nice work!
For me I just did not get what the lr decay of prediction MLP means. Does it mean the lr decay in the pretraining stage as we normally use?