TaskCluster docs for non-maintainers

AmitMY commented 5 months ago

It's me again, I've been trying to get this repository to work for over two years now. This project still seems like the best machine translation project to train models for production, and for offline use.

Since you completely ditched snakemake, I'll try to get taskcluster to work again.

The documentation: https://mozilla.github.io/firefox-translations-training/task-cluster.html seems to be aimed at maintainers of this specific repository.

Could you please add some information for people outside of this repo? I expect step 1 to be "Fork this repository", and then step 2 to be how to set up this repo with task cluster.

gregtatum commented 5 months ago

We've been invested heavily in getting the system working for Mozilla infrastructure which uses Taskcluster. Unfortunately from what I hear from our Taskcluster support team is that it's hard to stand up your own Taskcluster instance.

There are integrations with our infrastructure that allow project maintainers to push a branch to github.com/mozilla/firefox-translations-training and then from the decision task, trigger training. This runs all of the Mozilla managed infrastructure.

I'd really like to figure out a way to help you get this running. In our testing infrastructure we actually have a way to run the tasks through a run_task utility. Perhaps you could import that utility, into a python script and then process the full_task_graph.json to build a dependency graph. If you are running on your own managed machine it could be possible to just run everything locally on that machine.

I'd be happy to hop on a call to discuss this further, or specify what work it would take here.

https://github.com/mozilla/firefox-translations-training/blob/067ce65e3c19cce0de950401da802cf4bf07e7a3/tests/fixtures/__init__.py#L112

You would probably want to wait until I finish up work in PR #568, as that will make the docker image and run_task abstraction just work.

If you run task preflight-check it will generate the artifacts/full-task-graph.json file which fully specifies the tasks.

gregtatum commented 5 months ago

and then step 2 to be how to set up this repo with task cluster.

To be explicit on this request. I don't know that we have these steps or clear recommendations.

AmitMY commented 5 months ago

hmmm if run_task can be local, then in theory it can just run on a single machine. I do see one difficulty with that, which is that each task requires its own environment.

If I understand you right, you propose:

task preflight-check to generate artifacts/full-task-graph.json
Iterate over the DAG, and somehow run_task based on the information in the json.

Is this right?

To complicate it a little more, I would ideally want to try mimic your old snakemake solution on slurm:

same
submit slurm jobs for all currently "available" task. the task will dump its "status" (success/error) in some file
whenever a job ends, it will trigger a re-check if new things are available to run

eu9ene commented 5 months ago

@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it.

@AmitMY and thank you for your continuous interest in our project! The complexity of the pipeline is quite high and it is indeed hard to spin it up. Some people from the University of Edinburgh and the University of Helsinki have successfully used it in the past on their infrastructure, so I think it's possible but requires some hacking. I hope we'll have resources to invest in making it more user-friendly in the future.

gregtatum commented 5 months ago

which is that each task requires its own environment

In #568 I'm creating a virtual environment per requirements file, which could help solve that, and we're working on having a Docker environment where everything works.

bhearsum commented 5 months ago

and then step 2 to be how to set up this repo with task cluster.

To be explicit on this request. I don't know that we have these steps or clear recommendations.

The prerequisite for this would be spinning up a Taskcluster instance. This is not impossible (there's a handful of non-Mozilla installations already), but I get the sense that it's not a practical about for you, @AmitMY.

@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it.

Yeah, if we want to the pipeline generally usable, this is the most practical way IMO. (Either to snakemake or it could even be a conversion to something like Metaflow. The main point is that we'd want something that can reliably dump the current DAG / task payloads into a more generally useful format.)

sylvestre commented 5 months ago

Also, depending on what you are trying to do, we could discuss about collaboration, access and support. Don't hesitate to contact me : s@mozilla.com

mozilla / translations

TaskCluster docs for non-maintainers #586