Open franknoe opened 9 years ago
I wasn't aware of this sort of "pilot job" system: there are some cases where it could definitely be useful -- thanks for pointing it out!
In general, parallelization will have to depend a little on the nature of the path sampling method used. Some methods, (fixed length TPS and fixed bias SRTIS) could really be run as independent processes. At the other extreme, you have things like RETIS, where you run into limits of how many cores you can use -- assuming you want to maintain good load balancing, at least! But the cases which fall between those extremes could benefit from a pilot job system.
(For RETIS, my own suggestion has been to build around a cluster's queue system so that your RETIS calculation's resource requirements can change, making resources available to other group members; that's some kludgy scripts that are, so far, not part of OPS.)
Parallelization was on the "first after 1.0" to-do list. I think it will take some serious brainstorming to figure out a good way to fit it smoothly into the overall OPS structure, but it is also hugely important.
(ps -- mainly for @jchodera : this is my first GitHub activity from an MSKCC computer!)
In general, parallelization will have to depend a little on the nature of the path sampling method used.
Can we build a list of use cases (maybe on the wiki pages?) and brainstorm a bit about what kinds of parallelization we might want to use? It might be good to do a bit of requirements capture here first before diving into the implementation.
I'm totally OK with this being post-1.0.
Can we build a list of use cases (maybe on the wiki pages?) and brainstorm a bit about what kinds of parallelization we might want to use?
https://github.com/choderalab/openpathsampling/wiki/Parallelization-Notes
Just some initial thoughts. I need to put together a better description of OneWrapper (probably best to just put its repo on GitHub).
OneWrapper is now on GitHub: https://github.com/dwhswenson/OneWrapper
No significant updates to it (just imported my private-server git repo to GitHub), and so the docs are incomplete. It's also very kludgy and has a lot of server-specific behaviors. But the code/docs might be enough to explain how it works better than my paragraph on the wiki.
Radical-Pilot is now EnsembleMD:
Again this is a general statement as I am not yet familiar enough with what OPS has built in, and just based on discussions with @jhprinz . To run parallel workers in an environment-agnostic way that would work both on local clusters and on HPC resources (e.g. Archer, Titan), it might be worthwhile considering to include a generic middle-ware package such as radical pilot:
http://radicalpilot.readthedocs.org/en/latest/