Open guvcolie opened 4 years ago
Thank you for your excellent code! You use teacher-student distilling method when training sub-models, how is the accuracy of the teacher model (kernel size is 7, expansion is 6 and 4 layers in each unit)?
Thank you for your excellent code! You use teacher-student distilling method when training sub-models, how is the accuracy of the teacher model (kernel size is 7, expansion is 6 and 4 layers in each unit)?