prioritizr / benchmark

Benchmark performance of exact algorithms solvers for conservation planning
GNU General Public License v3.0
0 stars 0 forks source link

Attempt benchmarks #4

Closed jeffreyhanson closed 3 years ago

jeffreyhanson commented 3 years ago

I've managed to get the benchmark analysis running on a server, so I think it's all working now. @ricschuster, when you get a chance, could you please try running it on your system and see if it works? The Makefile is currently configured to run a small pared down version of the analysis to help identify errors/issues quickly. So, if use the system command make clean all that should be a good test? Once we've verified that it works correctly, I'll update the parameters in the Makefile to run the full analysis. How does that sound?

jeffreyhanson commented 3 years ago

Ok - can you please pull the latest version and try it again?

ricschuster commented 3 years ago

Alright, this is working now. Thanks very much for figuring this out!

ricschuster commented 3 years ago

I've had the benchmarking running (03-analysis.R) overnight and so far only 23 runs saved results.

The beginning is this code snippet (starting line 60) seems like that its not running things in parallel (.parallel = FALSE). Is that the case? If so, is there a reason for this or could we run things in parallel? That would really speed things up here.

benchmark_results <-
  benchmark_results %>%
  dplyr::sample_frac() %>% # randomize benchmark order
  dplyr::mutate(id2 = seq_len(nrow(.))) %>%
  plyr::ddply(
    "id2",
    .parallel = FALSE, #exists("cl"),
    .progress = ifelse(exists("cl"), "none", "text")
ricschuster commented 3 years ago

Does general_parameters$threads determine parallel processes?

I thought this would be related to how many threads to use for prioritzr, but I guess that's not the case. Is that correct?

ricschuster commented 3 years ago

I'm currently using general_parameters$threads of 10. As you can see below RAM keeps creeping up. Its now at 220GB. Do you have ideas of how to reduce the memory footprint of the analysis? Would be great to use more than 10 threads.

image

jeffreyhanson commented 3 years ago

The beginning is this code snippet (starting line 60) seems like that its not running things in parallel (.parallel = FALSE). Is that the case? If so, is there a reason for this or could we run things in parallel? That would really speed things up here.

Yeah that's correct, because including lpsymphony will fork bomb the benchmark analysis (i.e. it will set threads to be the total amount of cores on the system, so if you try to run other solvers at the same time then the total amount of threads being used is much greater than total cores, per #5). So, we can either include lpsymphony, or we can not run the benchmark runs in parallel. We can't do both (unless lpsymhony gets an update).

jeffreyhanson commented 3 years ago

Does general_parameters$threads determine parallel processes?

I thought this would be related to how many threads to use for prioritzr, but I guess that's not the case. Is that correct?

The general_parameters$threads specifies the total number of available threads, and the benchmark_parameters$threads specifies the number of threads to use per run. E.g. if general_parameters$threads = 6 and benchmark_parameters$threads = 2 then the benchmark analysis will do 3 runs at a time, and each run will use 2 threads.

ricschuster commented 3 years ago

Thanks Jeff! Will we still have an issue with lpsymphony if we do

general_parameters$threads <- 10
benchmark_parameters$threads <- 1

That's what I have running now and it seems to work.

jeffreyhanson commented 3 years ago

I'm currently using general_parameters$threads of 10. As you can see below RAM keeps creeping up. Its now at 220GB. Do you have ideas of how to reduce the memory footprint of the analysis? Would be great to use more than 10 threads.

Hmm, I'll take a look and see what I can do. Maybe we can include some manual garbage collection or something. In my experience, the solvers tend to use much more memory than the R process containing the data so there might not be too much we can do. I'll see if I can reduce memory consumption on the R side, and post an update later today.

If you have a chance, can you see how much memory Rsymphony and lpsymphony consume when they try to solve the largest size problem? If I remember correctly, these solvers don't use memory as efficiently as cplex or Gurobi --- so we might have to exclude these solvers from large problems sizes to reduce memory consumption.

jeffreyhanson commented 3 years ago

Thanks Jeff! Will we still have an issue with lpsymphony if we do

general_parameters$threads <- 10
benchmark_parameters$threads <- 1

That's what I have running now and it seems to work.

Yeah, we'll still have the issue because lpsymphony will always use the maximum number of threads. We have no way of controlling the number of threads that lpsymphony uses -- it just uses all of them up.

ricschuster commented 3 years ago

I'm currently using general_parameters$threads of 10. As you can see below RAM keeps creeping up. Its now at 220GB. Do you have ideas of how to reduce the memory footprint of the analysis? Would be great to use more than 10 threads.

Hmm, I'll take a look and see what I can do. Maybe we can include some manual garbage collection or something. In my experience, the solvers tend to use much more memory than the R process containing the data so there might not be too much we can do. I'll see if I can reduce memory consumption on the R side, and post an update later today.

If you have a chance, can you see how much memory Rsymphony and lpsymphony consume when they try to solve the largest size problem? If I remember correctly, these solvers don't use memory as efficiently as cplex or Gurobi --- so we might have to exclude these solvers from large problems sizes to reduce memory consumption.

Thanks Jeff! Just as an FYI, I started the benchmarks 7 hours ago and so far 129 tif's have been saved.

ricschuster commented 3 years ago

Thanks Jeff! Will we still have an issue with lpsymphony if we do

general_parameters$threads <- 10
benchmark_parameters$threads <- 1

That's what I have running now and it seems to work.

Yeah, we'll still have the issue because lpsymphony will always use the maximum number of threads. We have no way of controlling the number of threads that lpsymphony uses -- it just uses all of them up.

Interesting. This doesn't seem to be the case on my system. Each R instance that's running uses 1 thread. (10 instances running)

jeffreyhanson commented 3 years ago

Do you know if it's running an lpsymphony solve? I guess it's possible that a random selection of 10 runs may not include any lpsymphony runs. For instance, if I run the following example code:

library(prioritizr)
data(sim_pu_raster, sim_features)

# create problem
p <- problem(sim_pu_raster, sim_features) %>%
  add_min_set_objective() %>%
  add_relative_targets(0.05) %>%
  add_proportion_decisions() %>%
  add_lpsymphony_solver(time_limit = 5, verbose = FALSE)

# generate solution
s <- solve(p)

I see this printed to my console:

Automatically setting number of threads to 4
ricschuster commented 3 years ago

after the solve command?

On my console nothing is printed. It just solves and quickly finishes with no output.

jeffreyhanson commented 3 years ago

Yeah - that's after running the solve command.

Interesting - maybe it's because we're running different versions of SYMPHONY? What if you try it on your linux computer (which presumably has a more recent version of SYMPHONY via the Ubuntu repo)?

ricschuster commented 3 years ago

Probably different SYMPHONY versions. I don't see the output on either my Linux machine nor the Windows VM (just tried Windows as well).

jeffreyhanson commented 3 years ago

Oh - interesting - I just assumed that lpsymphony always used parallel processing.

jeffreyhanson commented 3 years ago

Ok I've just pushed a commit to try and improve speed and memory consumption. Note that you will find much less memory consumption on a Linux/Mac system because they can use FORK clusters which can share memory between parallel workers.

For exampe, if you are using a PSOCK cluster (which we have to use on Windows because it can't use FORK clusters), then if you have 10 workers you need 10x copies of all datasets. But if you have a FORK cluster, it can share memory between workers, so if you have 10 workers you still only need 1x copies of the datasets.

jeffreyhanson commented 3 years ago

It's probably best to wait till we've addressed https://github.com/prioritizr/prioritizr/issues/183 before trying this though, because the current implementation of add_lpsymphony_solver will still consume a ton of memory (even with these updates).

jeffreyhanson commented 3 years ago

Also, I'll need to update the version of prioritizr in packrat before the benchmarking code can use it.

ricschuster commented 3 years ago

Thanks very much Jeff! I'm using a Linux machine, so that should help. Is there anything I need to do for FORK clusters on Linux or does that work 'out of the box'?

ricschuster commented 3 years ago

Might be a good time for me to grab a new copy of the benchmark repo when things are updated, to give it a fresh start on my machine.

jeffreyhanson commented 3 years ago

Thanks very much Jeff! I'm using a Linux machine, so that should help. Is there anything I need to do for FORK clusters on Linux or does that work 'out of the box'?

Ah ok - excellent! No, it should "just work". If it doesn't -- then that's a bug -- so let me know and I can fix it. I've tested it on my Linux system so hopefully it will work though.

jeffreyhanson commented 3 years ago

Might be a good time for me to grab a new copy of the benchmark repo when things are updated, to give it a fresh start on my machine.

Yeah - good idea. I'll let you know when everything's ready to restart the benchmarks.

ricschuster commented 3 years ago

Thanks very much Jeff! I'm using a Linux machine, so that should help. Is there anything I need to do for FORK clusters on Linux or does that work 'out of the box'?

Ah ok - excellent! No, it should "just work". If it doesn't -- then that's a bug -- so let me know and I can fix it. I've tested it on my Linux system so hopefully it will work though.

How could I test if FORK clusters are working?

jeffreyhanson commented 3 years ago

Good question - I'll add an update to print cluster information to the log files.

ricschuster commented 3 years ago

Thanks!

jeffreyhanson commented 3 years ago

Woops, I didn't mean to close

jeffreyhanson commented 3 years ago

I've just added the cluster info to the log files.

jeffreyhanson commented 3 years ago

Ok - I've just updated the version of prioritizr in packrat. @ricschuster, if you clone the benchmark repo again and rerun the analyses, hopefully the memory consumption should be less now? Let me now if something goes wrong.

ricschuster commented 3 years ago

Thanks Jeff. Memory use is still high with 10 workers. Currently at 230GB.

One thing I noticed in benchmark.toml was that you have replicates in there. Do you think we should have replicates for this?

With a total of 1500 benchmark scenarios this will likely take a week to run.

I was also thinking we could ditch the relative targets of 0.01 and 0.05.

What do you think?

jeffreyhanson commented 3 years ago

Ah ok. Yeah, I reckon we could cut down the benchmark parameters a lot. What about this:

number_replicates = 1
relative_target = [0.1, 0.2, 0.3]
boundary_penalty_value = [0.0, 0.001]
solver = ["add_gurobi_solver", "add_rsymphony_solver", "add_cplex_solver", "add_cbc_solver", "add_lpsymphony_solver"]
## solver arguments
time_limit = 1e+3 # seconds
threads = 1
gap = [0.01]
jeffreyhanson commented 3 years ago

We could also try reducing the time limit too?

ricschuster commented 3 years ago

Ah ok. Yeah, I reckon we could cut down the benchmark parameters a lot. What about this:

number_replicates = 1
relative_target = [0.1, 0.2, 0.3]
boundary_penalty_value = [0.0, 0.001]
solver = ["add_gurobi_solver", "add_rsymphony_solver", "add_cplex_solver", "add_cbc_solver", "add_lpsymphony_solver"]
## solver arguments
time_limit = 1e+3 # seconds
threads = 1
gap = [0.01]

That's pretty much what I started this morning. Just added another relative target (0.15). I think the time limit is useful for the bigger problems to see if open source solvers can find solutions, just taking more time.

jeffreyhanson commented 3 years ago

Ah ok - sounds good. We could also try relaxing the gap, e.g. to 0.1?

ricschuster commented 3 years ago

Ah ok - sounds good. We could also try relaxing the gap, e.g. to 0.1?

Good idea. Current setup should finish in a day, so not too bad for now.

ricschuster commented 3 years ago

Two runs got stuck for about a day for some reason, but its finished now and the results are in a new release v0.0.3. What are the next steps you think? Will you put together a prioritizr vignette that ingests the benchmark results?

(I haven't looked at the results yet and am very curious to see how things compare.)

jeffreyhanson commented 3 years ago

Ok - great - thanks! I'll create a branch for prioritizr with a template vignette for you to write up. I'll include some code at the beginning of the template vignette to fetch the benchmark results and import them.

What do you think the best way to manage multiple benchmark runs is? I originally thought that we could just upload the benchmark results (using the command make export) to release v0.0.2 whenever we did a new run (thus over-writing previous releases) and use the data from v0.0.2 for the vignette. Or do you think it's worth adding a new release with each run, and always referring to the latest version for the vignette? What do you think?

ricschuster commented 3 years ago

Sounds good re: prioritizr branch.

make export causes an error for me (GitHub token issue), but that's beside the point I guess. Is it possible for the vignette to just grab the latest release and we can increment the release as we update the benchmark results? It would be kind of neat to have one release per benchmark run (I'm assuming we rarely update benchmarks). If we had a schedule for benchmarks (say every 6 months or so), we would have nice time stamped snapshots easily available in the releases. Not sure, maybe overwriting releases works too.

What do you think?

jeffreyhanson commented 3 years ago

Yeah, I like the idea of creating a new release for each benchmark run (also means we don't lose previous runs if something goes wrong). Let's do that. I have this feeling/notion that we might have to make additional commits to create new releases (in other words, we can't make 2 releases for the same commit). But we can easily get around that if we just add a date for the latest benchmark in the README.

ricschuster commented 3 years ago

Perfect!

jeffreyhanson commented 3 years ago

The results.rda file is pretty huge (700MB), would it be OK to use save the final results.rda file in a compressed format?

jeffreyhanson commented 3 years ago

E.g. for future runs I mean

jeffreyhanson commented 3 years ago

Oh wait - it is already in a compressed format

ricschuster commented 3 years ago

Was just about to write that

jeffreyhanson commented 3 years ago

Ah - sorry this is my fault. I forgot to add some clean up code to delete the large conservation planning dataset objects. I'll add that now. This should reduce the file size for future runs.

ricschuster commented 3 years ago

I've done some testing now and I'm puzzled by the results I have to admit.

Here the first setup:

number_of_planning_units = 102242
number_features = 72
relative_target = 0.1
boundary_penalty = 0

The timing results I get were: Gurobi = 25 seconds RSymphony = 314 s Cplex = 18 s CBC = 9.6 s lpS = 338 s

In the results table, Gurobi, Cplex and CBC match these numbers. RSymphony (8.1 s) and lpS (6.8 s) are way off.

For the second setup:

number_of_planning_units = 606180
number_features = 72
relative_target = 0.1
boundary_penalty = 0

The timing results I get were: Gurobi = 290 s RSymphony = 1841 s Cplex = 208 s CBC = 59 s lpS = 1230 s

None of the run_time values in the results table match (the closest is Gurobi with 431 s).

I've used the code in 03-analysis.R for the tests, so the code itself should be okay. For timing, I did use the system.time command wrapped around prioritizr::solve. For the timing results it seems to be an issue with the runtime attribute of the solution.

As an example I just ran, I get a system.time of 257 s and a as.numeric(attr(s, "runtime")[[1]]) of 3.91

What I still don't understand is why the time_limit argument gets ignored. Do you have any ideas about that one?

ricschuster commented 3 years ago

Not entirely related, but as I'm just checking out the 2.3 million pu scenario:

Gurobi: 1224 s CBC: 257 s

This is awesome! I think this might only hold for 'basic' min set problems though and maybe up to a certain number of features (which I haven't figured out yet). Regardless, great news for Marxan type analysis though.

jeffreyhanson commented 3 years ago

Hmm, I would expect the runtime timings to be shorter than system.time(solve()) (but maybe not differences as large as you report).

This is because the run time for system.time(solve()) will include (1) some preliminary calculations, (2) converting the problem to ILP format, (3) generating a solution, and (4) formatting the solution for output (e.g. in data.frame format). The runtime timings (and also the time_limit arguments) are only based on step (3). So, we might expect a non-trivial difference between your method and the runtime numbers for large datasets, where steps (1), (2), and (4) might take a while to complete.

I could try updating the runtime methods in prioritizr to also use system.time(...) to be consistent with your timing method? Would that be helpful?

jeffreyhanson commented 3 years ago

Not entirely related, but as I'm just checking out the 2.3 million pu scenario:

Gurobi: 1224 s CBC: 257 s

This is awesome!

Yeah that's is awesome!