Consolidate yaml schema and configs

eu9ene commented 6 months ago

Currently, when adding/changing a new setting in a Taskcluster experiment config we have to update it in multiple places:

train action schema
CI default parameters schema
CI default parameters values
Reference production config in taskcluster/configs
CI yaml config in taskcluster/configs that we currently don't use

We should consolidate to :

production reference YAML config (same as now)
CI YAML config in taskcluster/configs instead of the one in parameters json
one YAML schema that's used for validation in train action and elsewhere

eu9ene commented 6 months ago

Also, I'm still trying to figure out what taskcluster/test/params/large-lt-en.yml and taskcluster/test/params/small-lt-en.yml are for. It seems those are required for some tests and also need to be updated. Not updating them breaks task taskgraph-diff. Anyway, the tests should also use the reference production/CI configs from taskcluster/configs.

bhearsum commented 6 months ago

Also, I'm still trying to figure out what taskcluster/test/params/large-lt-en.yml and taskcluster/test/params/small-lt-en.yml are for. It seems those are required for some tests and also need to be updated. Not updating them breaks task taskgraph-diff. Anyway, the tests should also use the reference production/CI configs from taskcluster/configs.

Yes - these are used to generate graphs with and without some changes applied, and generate a useful to see how a code change affects graphs.

I agree that this could probably be reworked to pull in at least some things from a separate place. One of the advantages of having these concrete files, though, is that it allows us to have multiple versions. At the moment, we just have two with more and fewer datasets, but we could have variants with and without opuscleaner/opustrainer, with and without publication, with various training continuation configurations, etc.

eu9ene commented 2 weeks ago

Also, we have a bunch of test configs now in tests/fixtures. The tests should use taskcluster/configs/config.ci.yml so that we have only two configs to maintain.

mozilla / translations

Consolidate yaml schema and configs #597