mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Only retain the best metric model, and delete the others #850

Open gregtatum opened 1 month ago

gregtatum commented 1 month ago

It costs money to store models in the cloud. We could save a bit, and make the output of the train tasks a bit less confusing if we just stored a single final model. As far as I have seen, we never use any of the other models unless I'm missing something.

We would have to make sure that training continuation is updated and the fetching of old models for training still works.

What do you think @eu9ene?

eu9ene commented 1 month ago

I agree, but we should double-check that training continuation after preemption will work this way. I think it needs a bunch of files like the optimizer config that Marian writes to the directory. Another consideration is that GCS cost is likely not too high compared to GPUs, there are ways to archive things there and make it even cheaper.

gregtatum commented 1 month ago

I don't know that we've pulled numbers on storage costs to know them exactly, but my assumption is that it's a function of storage size and cache duration.