mlr-org / mlr3misc

Miscellaneous helper functions for mlr3
https://mlr3misc.mlr-org.com
GNU Lesser General Public License v3.0
11 stars 2 forks source link

parallelization of jobs with priority queue #36

Closed mb706 closed 7 months ago

mb706 commented 4 years ago

Unfortunately the futures package does not offer this (and R seems to suck at synchronized parallelization in general) so we would have to hack this in a way using polling.

I can write this, but I am wondering if this should be in misc or belongs to its own package? @berndbischl @mllg

berndbischl commented 4 years ago

this should reall not go into misc but its own package. and this is not a small project. if we ever wnat to do this i think we should involve other external dedicated persons

mllg commented 4 years ago

I'm not sure what you want to have exactly

mb706 commented 4 years ago

This would be necessary for proper parallelization in mlr3pipelines

berndbischl commented 4 years ago

@mb706 michels wants you to post some details to describe what you want....

mb706 commented 4 years ago

Ah for some reason I read mllg's comment as "what you want to have this for, exactly"...

The problem I want to solve is that we have "jobs" we want to parallelize, but some of the jobs themselves create new jobs. Example mlr3pipelines: whenever a "job" (here the train() or predict() call of the PipeOp) finishes, it should check whether any of its successor nodes in the Graph is ready to be run; if this is the case the successor node should be enqueued. Other example Hyperband: whenever more than half of the evaluations of a stage & bracket are finished, they are able to queue evaluations with higher budgets. Final example asynchronous MBO: If there are N threads evaluating points, whenever one evaluation finishes the result should be added to the archive and a new job should be started that proposes a new point and evaluates that point.

One way to go about this is to have some kind of Queue object that is shared between processes. The processes then run worker loops that block on the Queue$pop() operation, react to whatever comes out of it, and write the result to a different Queue. This could be done using some kind of shared database (cleanest solution, but has library dependencies and requires network setup) or in some hacky way through the file system (works out of the box on clusters but ugly and probably slow).

Another solution would be to use the future package, start job evaluations through the f[[i]] <- future() call, and then poll the list of futures through repeated calls of any(sapply(f, resolved)). We could put some API around that similar to Python's multiprocessing.apply_async():

apply_async <- function(fun, args, callback_fun, priority = 0)

so that fun(args) gets run on a core / remote machine as a "future" whenever a core is ready, and callback_fun(result_of_fun) gets called on the main thread when fun returns. The callback would then be able to enqueue more jobs depending on the result.

sebffischer commented 7 months ago

I guess this can be closed since Marc is now working on rush?

mb706 commented 7 months ago

Does rush actually do this?

sebffischer commented 7 months ago

I somehow just assumed that it addresses this issue