Launching workers, sending jobs, polling workers, and receiving output may take a long time and block the main process.
This blocking is not so bad for a callr-based crew, especially if the scheduler does not need to send much data (think tar_target(retrieval = "worker")).
The number of high-performance workers in a pipeline could number in the thousands, but local computers can only support a handful of processes for local scheduling.
Proposal
Establish an outer crew with callr workers. Each outer worker runs a crew of its own. These inner crews will interface to AWS Batch, GCP, etc., where launching and polling are the most costly in terms of runtime and blocking. Because all this slow stuff happens in outer callr workers, the main process will not be blocked. And it will be straightforward to control the number of outer vs inner workers.
Difficulty
I think this is perfectly doable with nested crews. We just need #2 to make sure jobs can access inner crews.
Problems/observations
callr
-based crew, especially if the scheduler does not need to send much data (thinktar_target(retrieval = "worker")
).Proposal
Establish an outer crew with
callr
workers. Each outer worker runs a crew of its own. These inner crews will interface to AWS Batch, GCP, etc., where launching and polling are the most costly in terms of runtime and blocking. Because all this slow stuff happens in outercallr
workers, the main process will not be blocked. And it will be straightforward to control the number of outer vs inner workers.Difficulty
I think this is perfectly doable with nested crews. We just need #2 to make sure jobs can access inner crews.