wlandau / crew

A distributed worker launcher
https://wlandau.github.io/crew/
Other
129 stars 4 forks source link

Launch workers asynchronously #133

Closed wlandau closed 1 year ago

wlandau commented 1 year ago

Prework

Proposal

I am working on a plugin for AWS Batch. Each API call to launch a Batch job takes about a second, and we need to get the job ID from the HTTP response. If launches are synchronous, that means it could take over 15 minutes to launch 1000 workers. And unfortunately, array jobs are never going to be a good fit for crew. I expect this sort of problem to be ubiquitous among cloud plugins.

Launches should be asynchronous. Fortunately, this is a much smaller problem than crew is trying to solve in the first place because:

  1. Tasks are short, with roughly equal execution times.
  2. Tasks all run locally.
  3. Auto-scaling is not necessary.

This allows us to use mirai directly with local daemons and the passive dispatcher. The launcher can launch and terminate local daemons using the start() and terminate() launcher methods (need to move worker shutdowns to a shutdown() method.) There can also be a condition variable which each daemon can signal if there is a launch error. launch() can check this condition variable for errors. I am not sure how much of this can be in base crew and how much needs to be in each specific plugin.