mozilla / translations

The code, training pipeline, and models that power Firefox Translations
https://mozilla.github.io/translations/
Mozilla Public License 2.0
154 stars 33 forks source link

Publish Marian/OpusTrainer configuration YAMLs and dataset statistics #720

Closed vrigal closed 1 month ago

vrigal commented 3 months ago

Closes #313 Closes #529

vrigal commented 3 months ago

I tested based on branch demo-online-publication (https://github.com/vrigal/firefox-translations-training/commit/8eda3ad7205ccc905d839f388e28592c13cce931), with fake configuration files and datasets. Results are visible as artifacts on this run: https://wandb.ai/teklia/720-online?nw=nwuserrigal.

I pushed a commit (named TRASHME) in order to trigger a training from the CI: KQpiJ_y2R2S5MwVNBt0ZXQ.

vrigal commented 3 months ago

I rebased on top of vrigal:unique-wandb-runs, so it will be easier for you to rebase once we merge it (by dropping the first commit).

I was actually able to publish all the artifacts from an online training task:

eu9ene commented 3 months ago

I rebased on top of vrigal:unique-wandb-runs, so it will be easier for you to rebase once we merge it (by dropping the first commit).

I was actually able to publish all the artifacts from an online training task:

* Taskcluster: https://firefox-ci-tc.services.mozilla.com/tasks/MwT-4_l5TIOOObkKiyW6pQ

* W&B run artifacts: https://wandb.ai/moz-translations/ru-en/runs/uqqbbf6r/artifacts

It's great that we were able to extract this info! However, I looked at https://wandb.ai/moz-translations/ru-en/runs/uqqbbf6r/artifacts and I don't think it will be convenient to use it in this form. Let's discuss live what we can do about it.

eu9ene commented 3 months ago

To reiterate what we discussed with @La0:

vrigal commented 3 months ago

@eu9ene I implemented the new publication. I restored the TRASHME commit (must be removed before merging). The new results are visible on the link below: https://wandb.ai/moz-translations/ru-en/groups/ci_FGBmVk5oQNSb7D9_7GaLSw/workspace?nw=nwuserbabadiemoz

vrigal commented 3 months ago

I finally decided to handle publishing extra-args for offline taskcluster/experiments in this PR. I ran a CI with group ZVWvAVL9Quq3hOiEkQxKdg, results are published here: https://wandb.ai/moz-translations/ru-en/groups/ci_ZVWvAVL9Quq3hOiEkQxKdg/

I will remove the last commit (used to trigger the CI) once all tasks complete.

vrigal commented 3 months ago

@eu9ene I published new results with your suggestions: https://wandb.ai/moz-translations/ru-en/groups/ci_IYdCmIuJSTW3Cw3_9TaHbA

I'm removing the commit used to trigger the jobs in Taskcluster.

For a reminder, here are the 2 other groups created during this development (may help you cleaning up things):

eu9ene commented 2 months ago

I see two datasets tables for some reason: https://wandb.ai/moz-translations/ru-en/runs/teacher-1_IYdCm/workspace?nw=nwuserepavlov

vrigal commented 2 months ago

I see two datasets tables for some reason: https://wandb.ai/moz-translations/ru-en/runs/teacher-1_IYdCm/workspace?nw=nwuserepavlov

This is certainly related to https://github.com/mozilla/firefox-translations-training/issues/716#issuecomment-2278790264. When I use a different workspace (i.e. nwuserbabadiemoz) everything seems fine.

eu9ene commented 2 months ago

As discussed I suggest we wait for #792 to ensure everything works correctly and then merge.

eu9ene commented 2 months ago

@vrigal I triggered the CI and there are two "group_logs" there https://wandb.ai/moz-translations/ci?nw=nwuserepavlov

vrigal commented 2 months ago

@vrigal I triggered the CI and there are two "group_logs" there https://wandb.ai/moz-translations/ci?nw=nwuserepavlov

Nice catch, thank you ! I identified the problem and opened #820.

eu9ene commented 2 months ago

@vrigal there's an issue with training arguments for finetune-student step https://wandb.ai/moz-translations/ci/runs/finetune-student_ZrWxf/overview

Screenshot 2024-08-30 at 6 01 40 PM
vrigal commented 1 month ago

@vrigal there's an issue with training arguments for finetune-student step https://wandb.ai/moz-translations/ci/runs/finetune-student_ZrWxf/overview

Thank you ! The script was detecting the training and model configs based on the end of the filename (*.train.yml). I updated it to detect base on the start of the string (configs/model, configs/training), which is logic. I'm pushing to CI again.

vrigal commented 1 month ago

New results have been published there: https://wandb.ai/moz-translations/ci/groups/ci_bT86ivvqSPSTJT0T2d-k_A/workspace?nw=nwuserepavlov I'm removing the parameter to trigger CI.