A layer decay schedule is mentioned in Section 5.1 Implementation details "To avoid deteriorating the general representations obtained from the previous stage, a layer decay schedule is adopted to train the student model for all downstream tasks."
Could you show more details of the layer decay schedule? Or point me to the code/reference of the schedule?
A layer decay schedule is mentioned in Section 5.1 Implementation details "To avoid deteriorating the general representations obtained from the previous stage, a layer decay schedule is adopted to train the student model for all downstream tasks."
Could you show more details of the layer decay schedule? Or point me to the code/reference of the schedule?
Thanks