How is the accuracy of the teacher model (once-for-all model)?

mit-han-lab / once-for-all

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

https://ofa.mit.edu/

MIT License

1.89k stars 333 forks source link

How is the accuracy of the teacher model (once-for-all model)? #14

Open guvcolie opened 4 years ago

guvcolie commented 4 years ago

Thank you for your excellent code! You use teacher-student distilling method when training sub-models, how is the accuracy of the teacher model (kernel size is 7, expansion is 6 and 4 layers in each unit)?