Open casperdcl opened 2 years ago
NODE_INDEX
& NODE_TOTAL
— front-end versus back-endNote that running different code on each instance is not easy: determining the node index requires a few orchestrator building blocks.
idk what you mean by different code. I'm talking about same code, different logic-branch owing to different env vars.
#script
index = os.environ.get('TPI_PARALLEL_INDEX', 0)
total = os.environ.get('TPI_PARALLEL_TOTAL', 1)
tasks = 1337
batch_size = int(math.ceil(tasks / total))
for step in range(index*batch_size, (index+1)*batch_size, tasks):
do_work(step)
I'm talking about same code, different logic-branch
Also known as “different code” or, in other words, function parallelism.
PARALLEL_TOTAL
is the same as parallelism
and is straightforward to implement.
PARALLEL_INDEX
is not straightforward to implement: it requires synchronization to avoid having several machines with the same index.
If you add this to “in progress”, expect me to spend a couple weeks doing what we're supposed to do two quarters from now; i.e. determine whether to reinvent the orchestrator[^1] or not and, if advisable, reinvent it.
[^1]: It always begins with Raft & Serf, and then you feel the need of adding a command-line tool, some extra supporting services... and you have an orchestrator, identical to the existing ones, but admittedly less elegant.
PARALLEL_INDEX
is not straightforward to implement
Really? Argh. Backlogging.
Note to future self: it's also possible to hack something with a cloud-managed atomic queue, popping items when instances boot and pushing them when they're about to terminate. 🤷🏼♂️
Another dodecagonal wheel.
Another hacky possibility: two instance groups, one for the leader instance and other for the workers.
Re-commenting here for better context.
I came across this PR while looking for this feature with AWS EC2. I think the ability to operate parallel instances with regular cloud providers and have some sort of indexing, or any mechanism, to dispatch work to the different instances can greatly help small teams and individual developers who don't have resources to manage k8s.
Originally posted by @redabuspatrol in https://github.com/iterative/terraform-provider-iterative/issues/597#issuecomment-1183537070
CIRCLE_NODE_INDEX
&CIRCLE_NODE_TOTAL
in CircleCICI_NODE_INDEX
&CI_NODE_TOTAL
in GitLabparallelism = 8, script = "... some_conditional_fork_and_join_code($TPI_PARALLEL_INDEX, $TPI_PARALLEL_TOTAL) ...