Open AmitMY opened 9 months ago
I'm not a taskcluster expert, and maybe others can chime in here.
This has information on the taskgraph that is generated: https://taskcluster-taskgraph.readthedocs.io/en/latest/
If you run the utils/preflight_check.py
, it will generate a local taskgraph that you can inspect. It is located in the /artifacts
directory in the repo. I know there is a artifacts/run-task
that is in there. The artifacts/full-task-graph.json
contains all of the tasks that need to run.
As far as how taskcluster works beyond that is beyond my understanding of the system.
There is the https://chat.mozilla.org/#/room/#taskcluster:mozilla.org group that may answer questions.
Getting the tasks graph using:
make preflight-check
the run-task
seems to need to run on the servers, not on my client. I still can't figure out how to do it outside of CI though.
My goal is:
Apologies for the slow reply - I didn't see this issue until now.
It is technically possible to run your own Taskcluster instance and run training on it, although I'm not sure I would advise it. Roughly, the steps would be:
docker-compose up
with https://github.com/taskcluster/taskcluster)The Taskcluster channel that @gregtatum linked to is usually pretty keen to help others get the core Taskcluster services working, but I'm not sure how much guidance they'll be able to offer on Translations-specific things, nor can I commit to helping with this.
Another option that we have discussed for the future is to build a feature in Taskgraph to generate a Snakemake definition in addition to a Taskcluster one. We are not sure if/when we'll be able to build it though.
Thanks @bhearsum - I guess since I don't really have permissions on Mozilla's cluster, my only course of action is to set up a new instance.
@marco-c that would be swell! I think that would allow for much easier experimentation for researchers. Until now, I was running it in a docker container on a single 4 GPU machine, and it worked fine, except the translation performance was poor. Now that many bugs should be fixed, I wanted to try again but the snakemake definitions are out-of-date.
Since you are not maintaining Snakemake, I'd like to use TaskCluster. I read these instructions - https://github.com/mozilla/firefox-translations-training/blob/main/docs/task-cluster.md which seem to claim that training runs happen from git CI.
I would like to run taskcluster locally, and configure it to my GCP instance.
Seems like I need to start with
Now opening http://taskcluster opens taskcluster.
From here, how can I push the tasks group in this repository to the taskcluster? I feel like the tutorial should cover that . Also, will the tasks spawn GCP workers as needed, or should those be created ahead of time?