Closed eu9ene closed 3 months ago
Same here https://wandb.ai/moz-translations/vi-en/workspace?nw=nwuserepavlov https://firefox-ci-tc.services.mozilla.com/tasks/groups/Nc0SHbrgQaiFt4_FmKBXOA
I guess it started recently. It seems evals tasks are causing it
This is a serious bug, as all the following evaluation tasks fail with this message:
Multiple W&B runs already exist with name 'teacher-1': [<Run moz-translations/tr-en/ged9nebc (finished)>, <Run moz-translations/tr-en/shovl7mn (finished)>, …]. No data will be published.
This may be caused by the current implementation, that list existing runs before publishing (as the ID is required to resume a run in W&B). A race condition is possible among 2+ tasks (those tasks ran in ~3minutes), creating multiple runs with a similar display name, then causing this bug.
This somehow confirms our approach using unique run IDs (& name) is certainly the way to go.
https://wandb.ai/moz-translations/tr-en/workspace?nw=nwuserepavlov
https://firefox-ci-tc.services.mozilla.com/tasks/groups/SDD81N6sRu61LOL4xZJc-Q