Taskcluster evaluation artifacts on GCP are missing an importer

eu9ene commented 4 weeks ago

For example:

gsutil ls gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student

shows

gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.en
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.en.ref
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.lt
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/Neulab-tedtalks_test-1-eng-lit.metrics
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.en
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.en.ref
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.lt
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_Neulab-tedtalks_test-1-eng-lit.metrics
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.en
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.en.ref
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.lt
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_devtest.metrics
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.en
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.en.ref
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.lt
gs://moz-fx-translations-data--303e-prod-translations-data/models/lt-en/opustrainer_no_student_aug_K1iHndFUSxSEDRLg_H9l1A/evaluation/student/aug-mix_wmt19.metrics

For example aug-mix_wmt19.metrics should be sacrebleu_aug-mix_wmt19.metrics like we have for the old Snakemake experiments. Otherwise we can't upload it to W&B to compare with other experiments. See this comment: https://github.com/mozilla/firefox-translations-training/pull/799#issuecomment-2298932937

eu9ene commented 2 weeks ago

Ok, I renamed the existing artifacts with this script:

import os
import subprocess

ls_cmd = 'gsutil ls gs://moz-fx-translations-data--303e-prod-translations-data/models/*/*/evaluation/*/*'
result = subprocess.run(list(ls_cmd.split()), universal_newlines=True, stdout=subprocess.PIPE)
eval_files = result.stdout.split('\n')

for file in eval_files:
    if not file.strip():
        continue

    sep_pos = file.rfind('/')
    file_name = file[sep_pos+1:]

    if file_name.startswith('mtdata') or file_name.startswith('sacrebleu') or file_name.startswith('flores'):
        print(f'Skipping {file_name}, already renamed')
        continue

    if file_name.startswith('aug'):
        dataset_sep_pos = file_name.find('_')
        dataset_name = file_name[dataset_sep_pos+1:]
    else:
        dataset_name = file_name

    importer = ""
    if dataset_name.startswith('wmt'):
        importer = 'sacrebleu'
    elif dataset_name.startswith('devtest'):
        importer = 'flores'
    elif dataset_name.startswith('test'):
        importer = 'flores'
    elif dataset_name.startswith('Neulab'):
        importer = 'mtdata'
    else:
        raise ValueError(f'Unknown dataset name: {dataset_name}')

    new_file_name = f'{importer}_{file_name}'

    new_file = file[:sep_pos] + '/' + new_file_name
    rename_cmd = f'gsutil mv {file} {new_file}'
    print(f'Renaming {file_name} to {new_file_name} with command {rename_cmd}')
    subprocess.run(list(rename_cmd.split()))
    print('\n')

eu9ene commented 2 weeks ago

As discussed with @bhearsum we should rename the output files in the tasks.

As for the uploading script, we can either incorporate the renaming logic (there might be some extra datasets we'll need to specify there) or do nothing and use it as is but we'll have to rename again if we ever need to reupload those runs to W&B. Since we hope to reupload only once and to use the Taskcluster artifacts directly for this, we can go with the option two. We'll use online W&B tracking for the future runs.

mozilla / firefox-translations-training

Taskcluster evaluation artifacts on GCP are missing an importer #808