mead-ml / mead-baseline

Deep-Learning Model Exploration and Development for NLP
Apache License 2.0
243 stars 73 forks source link

support distillation with soft targets #928

Closed dpressel closed 2 years ago

dpressel commented 2 years ago

quickly mock up 1. supporting more losses in the classifier via kwargs and 2. what happens if the Y shape is dense. In that case the soft target argmax is treated as the selected label