prioritizr / benchmark

Benchmark performance of exact algorithms solvers for conservation planning
GNU General Public License v3.0
0 stars 0 forks source link

Attempt benchmarks 2 #7

Closed ricschuster closed 2 years ago

ricschuster commented 3 years ago

Thanks Jeff! Just starting this new issue here to continue #4

jeffreyhanson commented 3 years ago

Awesome - thanks!

jeffreyhanson commented 3 years ago

@ricschuster, I've just pushed a new commit to this repo with the new rcbc package version. This won't affect anything when running the benchmarks on your Ubuntu system, but if you tried running the benchmarks on your Windows system, then it will use CBC version 2.10.5. So, hopefully, we should now see the same results on Windows as Ubuntu?

ricschuster commented 3 years ago

Thanks very much @jeffreyhanson Maybe I should test a reduced benchmark version on Windows. A CBC question for you: What do Windows users need to do to use CBC 2.10.5 at this point? I basically want to get a sense if its easy enough so we can promote the solver to prioritizr users.

jeffreyhanson commented 3 years ago

No worries - yeah, that's a great idea.

Sorry, I wasn't clear. RWinLib now has the Windows binary files for CBC 2.10.5 and rcbc has been updated to use these binary files (instead of the older version). So, all a Windows users needs to do is install rcbc from GitHub (e.g. using remotes::install_github('dirkschumacher/rcbc')). Note that they still need to have Rtools installed on their computer.

ricschuster commented 3 years ago

Thanks very much. Just needing Rtools seems like a pretty reasonable requirement. What do you think about promoting the new version of prioritizr on Twitter and to the Marxan group? We could wait for rcbc to go to CRAN, but as you noted earlier, that might take quite some time.

jeffreyhanson commented 3 years ago

Yeah - great idea! I think the new version of prioritizr has some nice QOL features (e.g. the evaluation functions) that users might appreciate? Should the new CBC solver functionality also be promoted too? On the one hand, it might be worth waiting till the benchmark vignette is finished before we promote the new CBC functionality --- so we have some stats/graphs for interested readers? On the other hand, I guess there's no harm in saying that preliminary analysis shows that it's pretty fast and it might help people that can't use Gurobi/CPLEX? What do you think? Also, if it's helpful, I could compile a list of the main benefits/new features for you to include when promoting it?

jeffreyhanson commented 3 years ago

Sorry, I just had a thought - it's probably worth checking that the CBC solver functionality is pretty fast on Windows with CBC 2.10.5 before promoting it? I think/suspect/guess a good chunk of the users would have Windows systems, so it's probably worth making sure that the performance we see on Ubuntu is still achieved on Windows?

ricschuster commented 3 years ago

If you could compile a list of the main benefits/new features, that would be awesome!

For CBC it would be good for benchmarks to be finished for sure. That's going slow right now, only 4 scenarios completed today (167/480 complete). Great idea about testing CBC on Windows first. I will do that. Fingers crossed its comparable to Ubuntu.

jeffreyhanson commented 3 years ago

Ok - sounds great - thank you so much for leading the benchmark stuff!

Here's a list of the main benefits/new features in prioritizr (in order of importance according to user needs, this just my opinion though):

  1. New family of functions for evaluating solutions using summary statistics (e.g. calculating solution cost, total boundary length, https://prioritizr.net/reference/summaries.html).
  2. New add_cbc_solver to generate problems with the blazing fast, open source CBC solver (https://prioritizr.net/reference/add_cbc_solver.html).
  3. Update add_lpysmphony_solver and add_rsymphony_solver functions so their gap parameter specifies the relative optimality gap (similar to the Gurobi and CPLEX solvers). This is more of a bug fix than a "new feature" per se -- so we should probably be careful how we describe this?
  4. Update add_lpsymphony_solver to be more memory efficient
  5. Rename functions that evaluate planning unit importance for consistency (https://prioritizr.net/reference/importance.html).
  6. Assorted improvements to documentation, examples, and error messages.
  7. Reduced package installation time.
ricschuster commented 3 years ago

Thanks very much Jeff!

The benchmarking is taking a lot longer than I'd hoped for, primarily related to boundary penalty factors and open source solvers.

I think it would still be good to show benchmark results when we promote the new prioritizr version, but that might take a while. Do you think we should send something around now just for the updates or do you also think we should wait for the benchmark vignette?

jeffreyhanson commented 3 years ago

Yeah, I'm happy with either approach? So what ever you think is best? If you're not sure what to do either, then maybe we could set a time limit? E.g. we could see if the benchmark vignette can be completed in 2 weeks, and if it does, then let's announce the new version along with the benchmark. And if the benchmark vignette is still a work-in-progress by then, let's announce the new version anyway?

jeffreyhanson commented 3 years ago

Totally up to you though - I just wanted to suggest a third option in case it's helpful

ricschuster commented 3 years ago

I like it! Two weeks it is.

ricschuster commented 3 years ago

I've done a bit more testing now and it looks like using add_min_shortfall_objective in combination with add_boundary_penalties causes issues, because the boundary_penalty values we set are meant for add_min_set_objective. Even Gurobi has a really hard time to find a solution with pus = 12902. Do you have any ideas how to find better boundary_penalty values for add_min_shortfall_objective?

jeffreyhanson commented 3 years ago

Yeah - I think you're exactly correct. I think we might have to set different BLM values for each objective. I could update the parameter file to allow you to do this - what do you think? If I did that, would you be able to play around with different values for the min shortfall objective?

ricschuster commented 3 years ago

Thanks Jeff! Lets set different BLM's per objective. Do you think BLM's would scale with #pu's (i.e. test on small set and be okay on bigger set), or might there be a scale issue with BLM as well?

jeffreyhanson commented 3 years ago

Ok sounds good. Hmmm, it might work (since the targets are also relative). Couldn't hurt I guess? Would it be easier to impose a (relatively) short time limit to make sure that a given set of BLM values work for the easier runs (e.g. less PUs) of the benchmark analysis? E.g. if a given set of BLM values don't work for even the easier runs, then we know they definitely need changing to work for the larger BLM values?

ricschuster commented 3 years ago

Yeah, I was thinking about the time limit route as well. I was going to just use Gurobi and explore and set BLM values before expanding to other solvers.

jeffreyhanson commented 3 years ago

Ok - yeah that sounds good. Excellent idea restricting it to just Guorbi to find useful BLM values.

jeffreyhanson commented 3 years ago

@ricschuster, I've just pushed a commit with the latest CRAN version of prioritizr and the ability to specify different BLM values in the benchmark.toml file. What do you think? Please let me know if it's not clear how to use or if you have any follow up questions?

ricschuster commented 3 years ago

Awesome! Do I need to pull the entire repo from GitHub again because of the prioritizr update too?

jeffreyhanson commented 3 years ago

Yeah, if you want to use the new version of prioritizr -- but that's probably not needed just for finding good BLM values.

ricschuster commented 3 years ago

Thanks!

ricschuster commented 3 years ago

I think I've finally figured out the benchmark parameters. Gurobi, CPLEX, and CBC runs all completed in reasonable times today. Going to add lpsymphony and Rsymphony to the mix now and let things run over the weekend. Fingers crossed that this is it for benchmark runs.

jeffreyhanson commented 3 years ago

Ah ok - awesome - fingers crossed!

ricschuster commented 3 years ago

All 480 scenarios completed running and I've created a new pre-release based on them. I've updated the benchmark vignette with figures for min_set and will work on min_shortfall next. Getting really close to have everything together for this.

jeffreyhanson commented 3 years ago

Awesome - thanks! Yeah, it will be really exciting to see how the solvers compare with a different objective.

ricschuster commented 3 years ago

I've pushed a commit that now creates figures for both objective functions. If you knit benchmark.Rmd on the benchmark branch of prioritizr you can have a look. min_shortfall results are all over the place and I'm having a hard time interpreting them. The main takeaway for me is: CBC only performs well for min_set.

What do you think about the results?

jeffreyhanson commented 3 years ago

Awesome work!! I'm just tweaking the plots to add the stuff that I said I would add earlier (e.g. automatic unit conversions). Also, when I knit the Rmd to a html, the paragraphs look a bit odd (e.g. most sentences are on a different line). I think this might be because most of the sentences start on a new line (probably to avoid lines exceeding 80 characters in length)? So, I'm also reformatting the text to avoid this issue.

jeffreyhanson commented 3 years ago

Ok, I've just pushed an updated version of the benchmark vignette. I've tried to include comments to help explain what's happening - but let me know any changes are unclear?

ricschuster commented 3 years ago

This looks great, thanks very much for your updates!

As for the content: what are your thoughts on the min shortfall results? They are all over the place (e.g. SYMPHONY outperforming the others in several cases) that I'm wondering if there is something strange going on. The min set results generally suggest that the setup is sound, but I'm really surprised at the lack of consistency. What do you think?

jeffreyhanson commented 3 years ago

Yeah, I'm surprised at the lack of consistency too. I can't think of any reason why it could be be due to a bug (e.g. now that we've standardized the timing methods), but the possibility still remains. My impression is that SYMPHONY performs better than CBC for min shortfall with no boundary penalties, and for (generally) min shortfall with low boundary penalties, and min shortfall with small to moderate numbers of planning units with high boundary penalties. So, I guess I would generally recommend SYMPHONY for min shortfall over CBC, unless it takes ages, in which case trying CBC might be worth it. How does that sound?

ricschuster commented 3 years ago

Yeah, I think you are right. CBC recommendation really only makes sense for min set, which isn't a bad thing and probably >90% of what people implement at the moment.

I'm kind of scared to think about CBC processing times for multi-zone problems now.

jeffreyhanson commented 3 years ago

Yeah, it will be interesting to see how CBC performs for multi-zone problems. At some point in the future, it might be worth extending the benchmarks to include multiple zones so we can get a better handle on this -- but I don't think it's a priority for now?

ricschuster commented 3 years ago

I agree, multi zone benchmarking can wait. Lets get this vignette finished first and then promote the package updates.

ricschuster commented 3 years ago

I've pushed some text updates to the benchmark vignette and was wondering if you could have a look? We haven't used the raster results yet, but I'm wondering if what we have now would be sufficient already? I think the vignette gives a good sense of solver performances as is. What do you think?

ricschuster commented 3 years ago

I just remembered that we didn't finish this up yet. If I remember correctly, @jeffreyhanson you had offered to finish the vignette text. Is that still the plan?

jeffreyhanson commented 3 years ago

Yes, that's absolutely correct! Sorry, I forgot about this. I'll finish off the text today.

jeffreyhanson commented 2 years ago

I think I finished off the text, so I'll close this issue.