leverage content in pyina, and vice-versa

mmckerns commented 10 years ago

pyina has some schedulers and whatnot supported that ipython_cluster_helper doesn't and vice-versa. https://github.com/uqfoundation/pyina/blob/master/pyina/launchers.py I also have some machine-specific configs that I keep in a dev branch in my svn.

Both packages are small… we should figure out how to better leverage each other, or at the very least steal from each other mercilessly. This is on my agenda before summer, depending on proposal and travel commitments. I'm at the labs all of the time. Or maybe you guys will be sprinting somewhere I'm attending? Feel free to send email or otherwise. I know I've brought this up before, but we should do it.

chapmanb commented 10 years ago

Mike; This would be great. Let Rory and I know how best to integrate and borrow from pyina. We offload most of the actual work to IPython parallel and mainly are just creating batch scripts in ipython-cluster-helper but could certainly always use tips, tricks and testing for better cross scheduler support and would be happy to share.

This is a few months off, but we'll be at the pre-BOSC hackathon in July if you're attending ISMB or BOSC this year (http://www.open-bio.org/wiki/Codefest_2014). Thanks for kicking off the discussion.

mmckerns commented 10 years ago

Brad: BOSC is not my usual conference, but might do it if my schedule permits. actually, I may have some bioinformatics work in my near future - I can discuss over email.

pyina basically sets up mpirun (or scheduler or other similar) jobs, and wraps a multiprocessing interface around them. So at the lowest level, exchanging how to drive the different schedulers and whatnot is a win in itself. Also, configurations that are needed for certain natn'l lab machines are good -- I have a few that I don't include, but could easily. Aside from that, what we'd want to do is code to the same API… say, adaptors to the pool and pipe interface that multiprocessing uses. I already do this, and do this with pathos, so you'd get the same API for accessing other forms of parallelism (i.e. you can use programming models for execution). It's worth a chat anyway, I think.

chapmanb commented 10 years ago

Mike; That sounds great. Here is the high level documentation about how we currently use ipython-cluster-helper in bcbio-nextgen:

https://bcbio-nextgen.readthedocs.org/en/latest/contents/code.html#parallelization-framework

The prun section looks similar to what you do with the Torque (or other schedulers) and Mpi classes -- set up an parallel environment to run. Then we use a run function in the same way you use map. We don't rely on pickling, which we've found gets complex in trying to do for both multiprocessing and ipython, but have small wrappers that handle setting up the functions for both cases, just calling out to the actual functionality (IPython: https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/distributed/ipythontasks.py and multiprocessing: https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/distributed/multitasks.py).

The other useful abstraction is turning a list of required resources into instructions for the schedulers to create a cluster: run these 2 programs where program A needs 3Gb memory/core and 16 cores, program B needs 1Gb memory/core and 8 cores. It handles turning the program specifications into what is actually sent to the cluster.

Hope that helps for an overview of what we're doing. Thanks again.

mmckerns commented 10 years ago

Awesome. Thanks for the nice writeup, I'll have a look.

roryk commented 8 years ago

Hi Mike, looks like this isn't going to happen, so closing it out. Totally happy to reopen later on. Thanks so much!

mmckerns commented 8 years ago

@roryk: Good move. Hopefully someone becomes annoyed enough to move it off the back burner at some point. Thanks guys, I'll keep an eyeball on your development until then.

roryk / ipython-cluster-helper

leverage content in pyina, and vice-versa #17