ss3sim / ss3sim

An R package for stock-assessment simulation with Stock Synthesis
https://ss3sim.github.io/ss3sim/
Other
39 stars 26 forks source link

parallel computing is out of date #331

Open kellijohnson-NOAA opened 3 years ago

kellijohnson-NOAA commented 3 years ago

Problem

The code for creating parallelism in ss3sim is out of date.

Proposal

No good ideas at the moment. I previously tried to remove the parallel code and users wanted it returned. Any ideas here are highly encouraged :pleading_face:.

Disclaimers

k-doering-NOAA commented 3 years ago

What do you mean by "out of date"? That it is currently using an older pkg for parallel or that the code using parallel is not maintained?

I wonder if we could "test" that it still works by creating a vignette or other .RMD file that will run some code in parallel? That way it would be easy to hit "knit" and run the code. I'm not sure if there is a better way to do this, or if that would work if we ran it on github actions.

kellijohnson-NOAA commented 3 years ago

The code works but might not be the best available option out there. Also, you are correct in that there are no tests of the parallel code right now. Parallel computing is such a hot topic that I feel overwhelmed trying to keep up to date with best practices and worried that the code will fail eventually.

okenk commented 9 months ago

Two thoughts on this:

  1. I have used the {future} approach to parallelization a bit in {r4ss} and like it. It has not gone 100% smoothly, but the user sets in their session how they want functions to run (in parallel or sequentially), and then there is no need for a parallel argument in functions, you just run the future version of a for loop, apply function, purr function, etc. This way the developer does not have to anticipate the types of systems the code will be run on. In addition, it can handle RNG. I think it offloads some of the "I don't know what to do about all these settings!" to the user, and the developer does not have to "keep up," as you said @kellijohnson-NOAA.
  2. I ran 50 OMs and 6 EMs per OM on an Azure VM. The get_results_all() function actually took significantly longer than the simulations. So if you are going to do a refresh on parallel computing within {ss3sim}, I would put in a plug to include get_results_all() in that.
kellijohnson-NOAA commented 9 months ago

I plan on removing all of the parallel computing code that is imbedded within ss3sim and instead providing a vignette that gives users an example of how they can implement it as a wrapper. @okenk perhaps we could write at least the part that does get_results_all() together so you can read in your results faster.