lindeloev / job

job: free Your RStudio Console
https://lindeloev.github.io/job/
Other
247 stars 8 forks source link

Set maximum number of concurrent jobs / queue jobs? #56

Open LukasWallrich opened 4 months ago

LukasWallrich commented 4 months ago

Thanks for this excellent package!

It would be great to have an option to queue jobs so that only a reasonable number run concurrently.

The simplest way to allow for that would be to enable users to start a master-job that then controls the other jobs - but while jobs can spawn other jobs with the rstudioapi one cannot use job::job inside job::job (Error: RStudio not running). Could that be changed?

A more complex wrapper that automatically allows one to have a job pending until at most x other jobs are running would be a nice addition, but less important.

# This works - but using job::job instead of the runScript does not
job::job(spawn_test = {
  for (i in 1:5) {
    tempfile <- tempfile(fileext = ".R")
    write_lines("print('Hello')", tempfile)
    rstudioapi::jobRunScript(tempfile)
  }
}, import = NULL)
lindeloev commented 3 months ago

Ah, interesting idea, @LukasWallrich! I've definitely needed this a few times as well. Could you post an idea what the API could look like? (If we just pretend everything is possible). Here are two ideas off-the-top-of-my-head:

1: Set max using option() and then just launching jobs independently

Pros: quite intuitive/low-tech Cons: for large environments, they have to be exported from main for every job

option(max_concurrent_jobs = 5)
all_job_settings = c(list(a = 1), list(a = 2), list(a = 3))  # ... etc

for (i in 1:100) {
   job_setting = all_job_settings[i]
    job::job({
        print(job_setting$a))
    })
}

2: Arguments

Iterates through a list of lists and loads the list-members into global within the job. Pros: faster startup of each job do to only one export-from-main Cons: feels a bit more "invisible"/"magic".

all_job_settings = c(list(a = 1), list(a = 2), list(a = 3))  # ... etc
job::job({
    print(a)
}, import_list = all_job_settings, max_concurrent_jobs = 5)
chenyu-psy commented 2 months ago

Hi,

I also have a similar need and have done some work on it.

In my work, I usually use job to run computational models that may take a lot of time (e.g., a few hours), and the model can be set to use multiple cores. Therefore, I am not only taking care of the number of jobs running concurrently but also how many cores are available on my machine. For this purpose, I create a temporary file to store job information. Every time a new job is added, a new line of log is added. It records the index, name, require cores, priority and status of the current job. Every few seconds, each job will read the job log and update the queue list. If the current job is at the top of the queue list and there are sufficient cores on the machine, the current job will start to run. Otherwise, the job will wait until all the requirements are met.

These work has been uploaded to my own repository smartr. Unfortunately, I have only been working on this repository for one week and it is a bit messy and poorly documented. If you are interested in adding these features to the job package, I am happy to contribute my code. But of course, if you feel that these features are too focused on my needs, we can keep them independent.