Improve GPU utilization in student training

mozilla / translations

The code, training pipeline, and models that power Firefox Translations

https://mozilla.github.io/translations/

Mozilla Public License 2.0

154 stars 33 forks source link

Improve GPU utilization in student training #783

Open eu9ene opened 3 months ago

eu9ene commented 3 months ago

We noticed that it's only around 30%. It's likely because the model is smaller than the teacher. We can try improving it by increasing the batch size.

In comparison, for teacher training: