Open AmitMY opened 5 months ago
We've been invested heavily in getting the system working for Mozilla infrastructure which uses Taskcluster. Unfortunately from what I hear from our Taskcluster support team is that it's hard to stand up your own Taskcluster instance.
There are integrations with our infrastructure that allow project maintainers to push a branch to github.com/mozilla/firefox-translations-training
and then from the decision task, trigger training. This runs all of the Mozilla managed infrastructure.
I'd really like to figure out a way to help you get this running. In our testing infrastructure we actually have a way to run the tasks through a run_task utility. Perhaps you could import that utility, into a python script and then process the full_task_graph.json to build a dependency graph. If you are running on your own managed machine it could be possible to just run everything locally on that machine.
I'd be happy to hop on a call to discuss this further, or specify what work it would take here.
You would probably want to wait until I finish up work in PR #568, as that will make the docker image and run_task abstraction just work.
If you run task preflight-check
it will generate the artifacts/full-task-graph.json
file which fully specifies the tasks.
and then step 2 to be how to set up this repo with task cluster.
To be explicit on this request. I don't know that we have these steps or clear recommendations.
hmmm if run_task
can be local, then in theory it can just run on a single machine. I do see one difficulty with that, which is that each task requires its own environment.
If I understand you right, you propose:
task preflight-check
to generate artifacts/full-task-graph.json
run_task
based on the information in the json.Is this right?
To complicate it a little more, I would ideally want to try mimic your old snakemake solution on slurm:
slurm
jobs for all currently "available" task. the task will dump its "status" (success/error) in some file@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it.
@AmitMY and thank you for your continuous interest in our project! The complexity of the pipeline is quite high and it is indeed hard to spin it up. Some people from the University of Edinburgh and the University of Helsinki have successfully used it in the past on their infrastructure, so I think it's possible but requires some hacking. I hope we'll have resources to invest in making it more user-friendly in the future.
which is that each task requires its own environment
In #568 I'm creating a virtual environment per requirements file, which could help solve that, and we're working on having a Docker environment where everything works.
and then step 2 to be how to set up this repo with task cluster.
To be explicit on this request. I don't know that we have these steps or clear recommendations.
The prerequisite for this would be spinning up a Taskcluster instance. This is not impossible (there's a handful of non-Mozilla installations already), but I get the sense that it's not a practical about for you, @AmitMY.
@bhearsum had an idea of converting our Taskcluster graph to snakemake. I imagine generating a Snakefile using task-graph.json. @AmitMY You can try prototyping that but we won't have any resources in the near future to work on that as we're pushing hard on training the next pack of languages for Firefox. I can review PRs or provide recommendations on implementation but that's about it.
Yeah, if we want to the pipeline generally usable, this is the most practical way IMO. (Either to snakemake or it could even be a conversion to something like Metaflow. The main point is that we'd want something that can reliably dump the current DAG / task payloads into a more generally useful format.)
Also, depending on what you are trying to do, we could discuss about collaboration, access and support. Don't hesitate to contact me : s@mozilla.com
It's me again, I've been trying to get this repository to work for over two years now. This project still seems like the best machine translation project to train models for production, and for offline use.
Since you completely ditched
snakemake
, I'll try to gettaskcluster
to work again.The documentation: https://mozilla.github.io/firefox-translations-training/task-cluster.html seems to be aimed at maintainers of this specific repository.
Could you please add some information for people outside of this repo? I expect step 1 to be "Fork this repository", and then step 2 to be how to set up this repo with task cluster.