Closed ToTheBeginning closed 6 years ago
They are both new layers, why the lrs are different?
It mainly depends on the dataset which you would like to fine-tune. In this repo, we empirically set the learning rate and found it works.
They are both new layers, why the lrs are different?