Closed jeffreyhanson closed 3 years ago
Ok - can you please pull the latest version and try it again?
Alright, this is working now. Thanks very much for figuring this out!
I've had the benchmarking running (03-analysis.R
) overnight and so far only 23 runs saved results.
The beginning is this code snippet (starting line 60) seems like that its not running things in parallel (.parallel = FALSE
). Is that the case? If so, is there a reason for this or could we run things in parallel? That would really speed things up here.
benchmark_results <-
benchmark_results %>%
dplyr::sample_frac() %>% # randomize benchmark order
dplyr::mutate(id2 = seq_len(nrow(.))) %>%
plyr::ddply(
"id2",
.parallel = FALSE, #exists("cl"),
.progress = ifelse(exists("cl"), "none", "text")
Does
general_parameters$threads
determine parallel processes?
I thought this would be related to how many threads to use for prioritzr
, but I guess that's not the case. Is that correct?
I'm currently using general_parameters$threads
of 10.
As you can see below RAM keeps creeping up.
Its now at 220GB. Do you have ideas of how to reduce the memory footprint of the analysis?
Would be great to use more than 10 threads.
The beginning is this code snippet (starting line 60) seems like that its not running things in parallel (.parallel = FALSE). Is that the case? If so, is there a reason for this or could we run things in parallel? That would really speed things up here.
Yeah that's correct, because including lpsymphony will fork bomb the benchmark analysis (i.e. it will set threads to be the total amount of cores on the system, so if you try to run other solvers at the same time then the total amount of threads being used is much greater than total cores, per #5). So, we can either include lpsymphony, or we can not run the benchmark runs in parallel. We can't do both (unless lpsymhony gets an update).
Does
general_parameters$threads
determine parallel processes?I thought this would be related to how many threads to use for
prioritzr
, but I guess that's not the case. Is that correct?
The general_parameters$threads
specifies the total number of available threads, and the benchmark_parameters$threads
specifies the number of threads to use per run. E.g. if general_parameters$threads = 6
and benchmark_parameters$threads = 2
then the benchmark analysis will do 3 runs at a time, and each run will use 2 threads.
Thanks Jeff! Will we still have an issue with lpsymphony if we do
general_parameters$threads <- 10
benchmark_parameters$threads <- 1
That's what I have running now and it seems to work.
I'm currently using
general_parameters$threads
of 10. As you can see below RAM keeps creeping up. Its now at 220GB. Do you have ideas of how to reduce the memory footprint of the analysis? Would be great to use more than 10 threads.
Hmm, I'll take a look and see what I can do. Maybe we can include some manual garbage collection or something. In my experience, the solvers tend to use much more memory than the R process containing the data so there might not be too much we can do. I'll see if I can reduce memory consumption on the R side, and post an update later today.
If you have a chance, can you see how much memory Rsymphony and lpsymphony consume when they try to solve the largest size problem? If I remember correctly, these solvers don't use memory as efficiently as cplex or Gurobi --- so we might have to exclude these solvers from large problems sizes to reduce memory consumption.
Thanks Jeff! Will we still have an issue with lpsymphony if we do
general_parameters$threads <- 10 benchmark_parameters$threads <- 1
That's what I have running now and it seems to work.
Yeah, we'll still have the issue because lpsymphony will always use the maximum number of threads. We have no way of controlling the number of threads that lpsymphony uses -- it just uses all of them up.
I'm currently using
general_parameters$threads
of 10. As you can see below RAM keeps creeping up. Its now at 220GB. Do you have ideas of how to reduce the memory footprint of the analysis? Would be great to use more than 10 threads.Hmm, I'll take a look and see what I can do. Maybe we can include some manual garbage collection or something. In my experience, the solvers tend to use much more memory than the R process containing the data so there might not be too much we can do. I'll see if I can reduce memory consumption on the R side, and post an update later today.
If you have a chance, can you see how much memory Rsymphony and lpsymphony consume when they try to solve the largest size problem? If I remember correctly, these solvers don't use memory as efficiently as cplex or Gurobi --- so we might have to exclude these solvers from large problems sizes to reduce memory consumption.
Thanks Jeff! Just as an FYI, I started the benchmarks 7 hours ago and so far 129 tif's have been saved.
Thanks Jeff! Will we still have an issue with lpsymphony if we do
general_parameters$threads <- 10 benchmark_parameters$threads <- 1
That's what I have running now and it seems to work.
Yeah, we'll still have the issue because lpsymphony will always use the maximum number of threads. We have no way of controlling the number of threads that lpsymphony uses -- it just uses all of them up.
Interesting. This doesn't seem to be the case on my system. Each R instance that's running uses 1 thread. (10 instances running)
Do you know if it's running an lpsymphony solve? I guess it's possible that a random selection of 10 runs may not include any lpsymphony runs. For instance, if I run the following example code:
library(prioritizr)
data(sim_pu_raster, sim_features)
# create problem
p <- problem(sim_pu_raster, sim_features) %>%
add_min_set_objective() %>%
add_relative_targets(0.05) %>%
add_proportion_decisions() %>%
add_lpsymphony_solver(time_limit = 5, verbose = FALSE)
# generate solution
s <- solve(p)
I see this printed to my console:
Automatically setting number of threads to 4
after the solve
command?
On my console nothing is printed. It just solves and quickly finishes with no output.
Yeah - that's after running the solve command.
Interesting - maybe it's because we're running different versions of SYMPHONY? What if you try it on your linux computer (which presumably has a more recent version of SYMPHONY via the Ubuntu repo)?
Probably different SYMPHONY versions. I don't see the output on either my Linux machine nor the Windows VM (just tried Windows as well).
Oh - interesting - I just assumed that lpsymphony always used parallel processing.
Ok I've just pushed a commit to try and improve speed and memory consumption. Note that you will find much less memory consumption on a Linux/Mac system because they can use FORK clusters which can share memory between parallel workers.
For exampe, if you are using a PSOCK cluster (which we have to use on Windows because it can't use FORK clusters), then if you have 10 workers you need 10x copies of all datasets. But if you have a FORK cluster, it can share memory between workers, so if you have 10 workers you still only need 1x copies of the datasets.
It's probably best to wait till we've addressed https://github.com/prioritizr/prioritizr/issues/183 before trying this though, because the current implementation of add_lpsymphony_solver
will still consume a ton of memory (even with these updates).
Also, I'll need to update the version of prioritizr in packrat before the benchmarking code can use it.
Thanks very much Jeff! I'm using a Linux machine, so that should help. Is there anything I need to do for FORK clusters on Linux or does that work 'out of the box'?
Might be a good time for me to grab a new copy of the benchmark repo when things are updated, to give it a fresh start on my machine.
Thanks very much Jeff! I'm using a Linux machine, so that should help. Is there anything I need to do for FORK clusters on Linux or does that work 'out of the box'?
Ah ok - excellent! No, it should "just work". If it doesn't -- then that's a bug -- so let me know and I can fix it. I've tested it on my Linux system so hopefully it will work though.
Might be a good time for me to grab a new copy of the benchmark repo when things are updated, to give it a fresh start on my machine.
Yeah - good idea. I'll let you know when everything's ready to restart the benchmarks.
Thanks very much Jeff! I'm using a Linux machine, so that should help. Is there anything I need to do for FORK clusters on Linux or does that work 'out of the box'?
Ah ok - excellent! No, it should "just work". If it doesn't -- then that's a bug -- so let me know and I can fix it. I've tested it on my Linux system so hopefully it will work though.
How could I test if FORK clusters are working?
Good question - I'll add an update to print cluster information to the log files.
Thanks!
Woops, I didn't mean to close
I've just added the cluster info to the log files.
Ok - I've just updated the version of prioritizr in packrat. @ricschuster, if you clone the benchmark repo again and rerun the analyses, hopefully the memory consumption should be less now? Let me now if something goes wrong.
Thanks Jeff. Memory use is still high with 10 workers. Currently at 230GB.
One thing I noticed in benchmark.toml
was that you have replicates in there.
Do you think we should have replicates for this?
With a total of 1500 benchmark scenarios this will likely take a week to run.
I was also thinking we could ditch the relative targets of 0.01 and 0.05.
What do you think?
Ah ok. Yeah, I reckon we could cut down the benchmark parameters a lot. What about this:
number_replicates = 1
relative_target = [0.1, 0.2, 0.3]
boundary_penalty_value = [0.0, 0.001]
solver = ["add_gurobi_solver", "add_rsymphony_solver", "add_cplex_solver", "add_cbc_solver", "add_lpsymphony_solver"]
## solver arguments
time_limit = 1e+3 # seconds
threads = 1
gap = [0.01]
We could also try reducing the time limit too?
Ah ok. Yeah, I reckon we could cut down the benchmark parameters a lot. What about this:
number_replicates = 1 relative_target = [0.1, 0.2, 0.3] boundary_penalty_value = [0.0, 0.001] solver = ["add_gurobi_solver", "add_rsymphony_solver", "add_cplex_solver", "add_cbc_solver", "add_lpsymphony_solver"] ## solver arguments time_limit = 1e+3 # seconds threads = 1 gap = [0.01]
That's pretty much what I started this morning. Just added another relative target (0.15). I think the time limit is useful for the bigger problems to see if open source solvers can find solutions, just taking more time.
Ah ok - sounds good. We could also try relaxing the gap, e.g. to 0.1?
Ah ok - sounds good. We could also try relaxing the gap, e.g. to 0.1?
Good idea. Current setup should finish in a day, so not too bad for now.
Two runs got stuck for about a day for some reason, but its finished now and the results are in a new release v0.0.3.
What are the next steps you think?
Will you put together a prioritizr
vignette that ingests the benchmark results?
(I haven't looked at the results yet and am very curious to see how things compare.)
Ok - great - thanks! I'll create a branch for prioritizr with a template vignette for you to write up. I'll include some code at the beginning of the template vignette to fetch the benchmark results and import them.
What do you think the best way to manage multiple benchmark runs is? I originally thought that we could just upload the benchmark results (using the command make export
) to release v0.0.2 whenever we did a new run (thus over-writing previous releases) and use the data from v0.0.2 for the vignette. Or do you think it's worth adding a new release with each run, and always referring to the latest version for the vignette? What do you think?
Sounds good re: prioritizr
branch.
make export
causes an error for me (GitHub token issue), but that's beside the point I guess.
Is it possible for the vignette to just grab the latest release and we can increment the release as we update the benchmark results? It would be kind of neat to have one release per benchmark run (I'm assuming we rarely update benchmarks). If we had a schedule for benchmarks (say every 6 months or so), we would have nice time stamped snapshots easily available in the releases. Not sure, maybe overwriting releases works too.
What do you think?
Yeah, I like the idea of creating a new release for each benchmark run (also means we don't lose previous runs if something goes wrong). Let's do that. I have this feeling/notion that we might have to make additional commits to create new releases (in other words, we can't make 2 releases for the same commit). But we can easily get around that if we just add a date for the latest benchmark in the README.
Perfect!
The results.rda file is pretty huge (700MB), would it be OK to use save the final results.rda file in a compressed format?
E.g. for future runs I mean
Oh wait - it is already in a compressed format
Was just about to write that
Ah - sorry this is my fault. I forgot to add some clean up code to delete the large conservation planning dataset objects. I'll add that now. This should reduce the file size for future runs.
I've done some testing now and I'm puzzled by the results I have to admit.
Here the first setup:
number_of_planning_units = 102242
number_features = 72
relative_target = 0.1
boundary_penalty = 0
The timing results I get were: Gurobi = 25 seconds RSymphony = 314 s Cplex = 18 s CBC = 9.6 s lpS = 338 s
In the results table, Gurobi, Cplex and CBC match these numbers. RSymphony (8.1 s) and lpS (6.8 s) are way off.
For the second setup:
number_of_planning_units = 606180
number_features = 72
relative_target = 0.1
boundary_penalty = 0
The timing results I get were: Gurobi = 290 s RSymphony = 1841 s Cplex = 208 s CBC = 59 s lpS = 1230 s
None of the run_time values in the results table match (the closest is Gurobi with 431 s).
I've used the code in 03-analysis.R
for the tests, so the code itself should be okay.
For timing, I did use the system.time
command wrapped around prioritizr::solve
.
For the timing results it seems to be an issue with the runtime
attribute of the solution.
As an example I just ran, I get a system.time
of 257 s and a as.numeric(attr(s, "runtime")[[1]])
of 3.91
What I still don't understand is why the time_limit
argument gets ignored. Do you have any ideas about that one?
Not entirely related, but as I'm just checking out the 2.3 million pu scenario:
Gurobi: 1224 s CBC: 257 s
This is awesome! I think this might only hold for 'basic' min set problems though and maybe up to a certain number of features (which I haven't figured out yet). Regardless, great news for Marxan type analysis though.
Hmm, I would expect the runtime
timings to be shorter than system.time(solve())
(but maybe not differences as large as you report).
This is because the run time for system.time(solve())
will include (1) some preliminary calculations, (2) converting the problem to ILP format, (3) generating a solution, and (4) formatting the solution for output (e.g. in data.frame
format). The runtime
timings (and also the time_limit
arguments) are only based on step (3). So, we might expect a non-trivial difference between your method and the runtime
numbers for large datasets, where steps (1), (2), and (4) might take a while to complete.
I could try updating the runtime
methods in prioritizr to also use system.time(...)
to be consistent with your timing method? Would that be helpful?
Not entirely related, but as I'm just checking out the 2.3 million pu scenario:
Gurobi: 1224 s CBC: 257 s
This is awesome!
Yeah that's is awesome!
I've managed to get the benchmark analysis running on a server, so I think it's all working now. @ricschuster, when you get a chance, could you please try running it on your system and see if it works? The Makefile is currently configured to run a small pared down version of the analysis to help identify errors/issues quickly. So, if use the system command
make clean all
that should be a good test? Once we've verified that it works correctly, I'll update the parameters in the Makefile to run the full analysis. How does that sound?