Open eu9ene opened 3 months ago
Ok, I renamed the existing artifacts with this script:
import os
import subprocess
ls_cmd = 'gsutil ls gs://moz-fx-translations-data--303e-prod-translations-data/models/*/*/evaluation/*/*'
result = subprocess.run(list(ls_cmd.split()), universal_newlines=True, stdout=subprocess.PIPE)
eval_files = result.stdout.split('\n')
for file in eval_files:
if not file.strip():
continue
sep_pos = file.rfind('/')
file_name = file[sep_pos+1:]
if file_name.startswith('mtdata') or file_name.startswith('sacrebleu') or file_name.startswith('flores'):
print(f'Skipping {file_name}, already renamed')
continue
if file_name.startswith('aug'):
dataset_sep_pos = file_name.find('_')
dataset_name = file_name[dataset_sep_pos+1:]
else:
dataset_name = file_name
importer = ""
if dataset_name.startswith('wmt'):
importer = 'sacrebleu'
elif dataset_name.startswith('devtest'):
importer = 'flores'
elif dataset_name.startswith('test'):
importer = 'flores'
elif dataset_name.startswith('Neulab'):
importer = 'mtdata'
else:
raise ValueError(f'Unknown dataset name: {dataset_name}')
new_file_name = f'{importer}_{file_name}'
new_file = file[:sep_pos] + '/' + new_file_name
rename_cmd = f'gsutil mv {file} {new_file}'
print(f'Renaming {file_name} to {new_file_name} with command {rename_cmd}')
subprocess.run(list(rename_cmd.split()))
print('\n')
As discussed with @bhearsum we should rename the output files in the tasks.
As for the uploading script, we can either incorporate the renaming logic (there might be some extra datasets we'll need to specify there) or do nothing and use it as is but we'll have to rename again if we ever need to reupload those runs to W&B. Since we hope to reupload only once and to use the Taskcluster artifacts directly for this, we can go with the option two. We'll use online W&B tracking for the future runs.
For example:
shows
For example
aug-mix_wmt19.metrics
should besacrebleu_aug-mix_wmt19.metrics
like we have for the old Snakemake experiments. Otherwise we can't upload it to W&B to compare with other experiments. See this comment: https://github.com/mozilla/firefox-translations-training/pull/799#issuecomment-2298932937