mozilla / firefox-translations-training

Training pipelines for Firefox Translations neural machine translation models
https://mozilla.github.io/firefox-translations-training/
Mozilla Public License 2.0
145 stars 31 forks source link

Add publishing to CI #792

Closed eu9ene closed 1 month ago

eu9ene commented 1 month ago

Since experiment tracking is now tightly integrated with training and evaluation scripts we should make sure refactorings don't break compatibility with the tracking module.

Let's enable publishing in CI. To not pollute the current default project in CI config (ru-en), let's somehow detect that it's CI from the tracking module and publish it to a separate project (named ci).

eu9ene commented 1 month ago

@bhearsum what would be the best/proper way to detect that we're in CI from inside Python code? Do we have any extra environment variables set? Or do we have to just add an extra config setting for this and propagate a new env var?

La0 commented 1 month ago

That CI task has the following env vars:

    FIREFOX_TRANSLATIONS_TRAINING_HEAD_REF: bump_disk_cefilter
    FIREFOX_TRANSLATIONS_TRAINING_HEAD_REV: 3151f7696dc000379b921f08f6d0dad18fc4003a
    FIREFOX_TRANSLATIONS_TRAINING_BASE_REPOSITORY: https://github.com/mozilla/firefox-translations-training
    FIREFOX_TRANSLATIONS_TRAINING_HEAD_REPOSITORY: https://github.com/mozilla/firefox-translations-training
    FIREFOX_TRANSLATIONS_TRAINING_REPOSITORY_TYPE: git

@eu9ene If you link a production training run, we can compare the env variables, and those would be either missing or different enough that we can infer we are on CI (especially the head ref)

vrigal commented 1 month ago

The only difference I saw among those variables is FIREFOX_TRANSLATIONS_TRAINING_HEAD_REF which seems to start with refs/heads/ for production runs (e.g. MoIUo, RfqPq from W&B moz-translations workspace). Otherwise it is the branch name.

I will start an implementation based on this.

eu9ene commented 1 month ago

We can do a simpler thing. Let's just check the experiment name:

if experiment["name"] == "ci":
  project = "ci"
else:
  project = f"{experiment['src']}-{experiment['trg']}"

Here's the CI experiment config:

https://github.com/mozilla/firefox-translations-training/blob/43d5680620408cb2defbd90099d1a25e85c2d215/taskcluster/translations_taskgraph/parameters.py#L19