Open LukasWallrich opened 4 months ago
Ah, interesting idea, @LukasWallrich! I've definitely needed this a few times as well. Could you post an idea what the API could look like? (If we just pretend everything is possible). Here are two ideas off-the-top-of-my-head:
Pros: quite intuitive/low-tech Cons: for large environments, they have to be exported from main for every job
option(max_concurrent_jobs = 5)
all_job_settings = c(list(a = 1), list(a = 2), list(a = 3)) # ... etc
for (i in 1:100) {
job_setting = all_job_settings[i]
job::job({
print(job_setting$a))
})
}
Iterates through a list of lists and loads the list-members into global within the job. Pros: faster startup of each job do to only one export-from-main Cons: feels a bit more "invisible"/"magic".
all_job_settings = c(list(a = 1), list(a = 2), list(a = 3)) # ... etc
job::job({
print(a)
}, import_list = all_job_settings, max_concurrent_jobs = 5)
Hi,
I also have a similar need and have done some work on it.
In my work, I usually use job to run computational models that may take a lot of time (e.g., a few hours), and the model can be set to use multiple cores. Therefore, I am not only taking care of the number of jobs running concurrently but also how many cores are available on my machine. For this purpose, I create a temporary file to store job information. Every time a new job is added, a new line of log is added. It records the index, name, require cores, priority and status of the current job. Every few seconds, each job will read the job log and update the queue list. If the current job is at the top of the queue list and there are sufficient cores on the machine, the current job will start to run. Otherwise, the job will wait until all the requirements are met.
These work has been uploaded to my own repository smartr. Unfortunately, I have only been working on this repository for one week and it is a bit messy and poorly documented. If you are interested in adding these features to the job package, I am happy to contribute my code. But of course, if you feel that these features are too focused on my needs, we can keep them independent.
Thanks for this excellent package!
It would be great to have an option to queue jobs so that only a reasonable number run concurrently.
The simplest way to allow for that would be to enable users to start a master-job that then controls the other jobs - but while jobs can spawn other jobs with the
rstudioapi
one cannot use job::job inside job::job (Error: RStudio not running
). Could that be changed?A more complex wrapper that automatically allows one to have a job pending until at most x other jobs are running would be a nice addition, but less important.