mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
155 stars 34 forks source link

Consolidate yaml schema and configs #597

Open eu9ene opened 6 months ago

eu9ene commented 6 months ago

Currently, when adding/changing a new setting in a Taskcluster experiment config we have to update it in multiple places:

We should consolidate to :

eu9ene commented 6 months ago

Also, I'm still trying to figure out what taskcluster/test/params/large-lt-en.yml and taskcluster/test/params/small-lt-en.yml are for. It seems those are required for some tests and also need to be updated. Not updating them breaks task taskgraph-diff. Anyway, the tests should also use the reference production/CI configs from taskcluster/configs.

bhearsum commented 6 months ago

Also, I'm still trying to figure out what taskcluster/test/params/large-lt-en.yml and taskcluster/test/params/small-lt-en.yml are for. It seems those are required for some tests and also need to be updated. Not updating them breaks task taskgraph-diff. Anyway, the tests should also use the reference production/CI configs from taskcluster/configs.

Yes - these are used to generate graphs with and without some changes applied, and generate a useful to see how a code change affects graphs.

I agree that this could probably be reworked to pull in at least some things from a separate place. One of the advantages of having these concrete files, though, is that it allows us to have multiple versions. At the moment, we just have two with more and fewer datasets, but we could have variants with and without opuscleaner/opustrainer, with and without publication, with various training continuation configurations, etc.

eu9ene commented 2 weeks ago

Also, we have a bunch of test configs now in tests/fixtures. The tests should use taskcluster/configs/config.ci.yml so that we have only two configs to maintain.