parallel: id & examples #585

Open casperdcl opened 2 years ago

Expose task index via environment variables, similar to:
- CIRCLE_NODE_INDEX & CIRCLE_NODE_TOTAL in CircleCI
- CI_NODE_INDEX & CI_NODE_TOTAL in GitLab
add minimal working example to docs using parallelism = 8, script = "... some_conditional_fork_and_join_code($TPI_PARALLEL_INDEX, $TPI_PARALLEL_TOTAL) ...

`NODE_INDEX` & `NODE_TOTAL` — front-end versus back-end

front-end-vs-back-end-1

Note that running different code on each instance is not easy: determining the node index requires a few orchestrator building blocks.

idk what you mean by different code. I'm talking about same code, different logic-branch owing to different env vars.

#script
index = os.environ.get('TPI_PARALLEL_INDEX', 0)
total = os.environ.get('TPI_PARALLEL_TOTAL', 1)

tasks = 1337
batch_size = int(math.ceil(tasks / total))
for step in range(index*batch_size, (index+1)*batch_size, tasks):
    do_work(step)

I'm talking about same code, different logic-branch

Also known as “different code” or, in other words, function parallelism.

PARALLEL_TOTAL is the same as parallelism and is straightforward to implement.

PARALLEL_INDEX is not straightforward to implement: it requires synchronization to avoid having several machines with the same index.

If you add this to “in progress”, expect me to spend a couple weeks doing what we're supposed to do two quarters from now; i.e. determine whether to reinvent the orchestrator[^1] or not and, if advisable, reinvent it.

[^1]: It always begins with Raft & Serf, and then you feel the need of adding a command-line tool, some extra supporting services... and you have an orchestrator, identical to the existing ones, but admittedly less elegant.

PARALLEL_INDEX is not straightforward to implement

Really? Argh. Backlogging.

Note to future self: it's also possible to hack something with a cloud-managed atomic queue, popping items when instances boot and pushing them when they're about to terminate. 🤷🏼‍♂️

Another dodecagonal wheel.

Another hacky possibility: two instance groups, one for the leader instance and other for the workers.

Re-commenting here for better context.

I came across this PR while looking for this feature with AWS EC2. I think the ability to operate parallel instances with regular cloud providers and have some sort of indexing, or any mechanism, to dispatch work to the different instances can greatly help small teams and individual developers who don't have resources to manage k8s.

Originally posted by @redabuspatrol in https://github.com/iterative/terraform-provider-iterative/issues/597#issuecomment-1183537070

iterative / terraform-provider-iterative

parallel: id & examples #585

NODE_INDEX & NODE_TOTAL — front-end versus back-end

`NODE_INDEX` & `NODE_TOTAL` — front-end versus back-end