openpathsampling / openpathsampling

An open source Python framework for transition interface and path sampling calculations.
http://openpathsampling.org
MIT License
105 stars 49 forks source link

Consider middle ware for HPC resources scenarios #314

Open franknoe opened 9 years ago

franknoe commented 9 years ago

Again this is a general statement as I am not yet familiar enough with what OPS has built in, and just based on discussions with @jhprinz . To run parallel workers in an environment-agnostic way that would work both on local clusters and on HPC resources (e.g. Archer, Titan), it might be worthwhile considering to include a generic middle-ware package such as radical pilot:

http://radicalpilot.readthedocs.org/en/latest/

dwhswenson commented 9 years ago

I wasn't aware of this sort of "pilot job" system: there are some cases where it could definitely be useful -- thanks for pointing it out!

In general, parallelization will have to depend a little on the nature of the path sampling method used. Some methods, (fixed length TPS and fixed bias SRTIS) could really be run as independent processes. At the other extreme, you have things like RETIS, where you run into limits of how many cores you can use -- assuming you want to maintain good load balancing, at least! But the cases which fall between those extremes could benefit from a pilot job system.

(For RETIS, my own suggestion has been to build around a cluster's queue system so that your RETIS calculation's resource requirements can change, making resources available to other group members; that's some kludgy scripts that are, so far, not part of OPS.)

Parallelization was on the "first after 1.0" to-do list. I think it will take some serious brainstorming to figure out a good way to fit it smoothly into the overall OPS structure, but it is also hugely important.

(ps -- mainly for @jchodera : this is my first GitHub activity from an MSKCC computer!)

jchodera commented 9 years ago

In general, parallelization will have to depend a little on the nature of the path sampling method used.

Can we build a list of use cases (maybe on the wiki pages?) and brainstorm a bit about what kinds of parallelization we might want to use? It might be good to do a bit of requirements capture here first before diving into the implementation.

I'm totally OK with this being post-1.0.

dwhswenson commented 9 years ago

Can we build a list of use cases (maybe on the wiki pages?) and brainstorm a bit about what kinds of parallelization we might want to use?

https://github.com/choderalab/openpathsampling/wiki/Parallelization-Notes

Just some initial thoughts. I need to put together a better description of OneWrapper (probably best to just put its repo on GitHub).

dwhswenson commented 9 years ago

OneWrapper is now on GitHub: https://github.com/dwhswenson/OneWrapper

No significant updates to it (just imported my private-server git repo to GitHub), and so the docs are incomplete. It's also very kludgy and has a lot of server-specific behaviors. But the code/docs might be enough to explain how it works better than my paragraph on the wiki.

franknoe commented 9 years ago

Radical-Pilot is now EnsembleMD:

https://radical-cybertools.github.io/ensemble-md/index.html