YANK workflow representation

jdakka commented 6 years ago

@andrrizzi

Could you take a look at the diagrammatic representation of YANK:

https://docs.google.com/drawings/d/1ds1eRqqCM9T_gEklX1a1f3I1ojx6RIk4JsgBNlDT4Bg/edit?usp=sharing

Feel free to add/modify/comment

andrrizzi commented 6 years ago

That looks about right!

To be more precise the "gathering" and "replica exchange" steps can be further subdivided. Currently, this is how it goes:

After propagation, the positions are gathered into the MPI node with rank 0. The only purpose of this gathering step is to be able to write them in the netcdf database so, depending on how we solve the writing to disk problem, it may not be necessary.
The energy matrix is computed in parallel. Each replica is assigned to n_replicas / n_mpi_processes rows of the matrix.
There's another gathering step to collect all the energy matrix rows into MPI node 0.
MPI node 0 performs Gibbs sampling using the full energy matrix and generate a new permutation of the state vector.
MPI node 0 sends the state vector to all MPI processes and the new states are assigned to replicas.

The order of 1) and 2) can be interchanged, if that makes it easier.

jchodera commented 6 years ago

I wonder if it would be more scalable (and also more scientifically interesting) to focus on SAMS instead of replica-exchange here for free energy calculations. That will greatly simplify the workflows for each ligand or mutant.

andrrizzi commented 6 years ago

Definitely! Depending on our time constraints though, I'm not sure I'll be able to include this into YANK before late Jan. Unless you're thinking about doing it outside YANK.

jchodera commented 6 years ago

Since EnsembleTk is python 2.7-only right now, it sounds like we might have to build a special 2.7-compatible code anyway if we wanted to run with EnsembleTk. Our best bet might be to try to get everything we can shoved into openmmtools and prepare special SAMS and Perses workflows with EnsembleTk where we can set up the systems with YANK and Perses outside of the workflow and then run the calculations using the bare minimum of code.

That would only work if most of the code we need in openmmtools is python 2.7-compatible, however. Since we switched to python 3-only, I'm not sure which parts of openmmtools would be OK to use in python 2.7, which is really what the feasibility of running anything with EnsembleTk depends on.

andrrizzi commented 6 years ago

I believe openmmtools is one of the few packages we have around that is still compatible with both python 2.7 and python 3 so that would definitely make things easier!

jchodera commented 6 years ago

Wow, you're right! We're even still testing it on python 2.7! https://travis-ci.org/choderalab/openmmtools

OK, I think this is the best plan, then.

@jdakka : Any chance you can swing by MSKCC once more before Patrick leaves? That would at least let us come up with some minimal workflows for you to test and help us identify all the components we need to move to openmmtools to make this work.

jchodera commented 6 years ago

(I'm back in NYC on Thu and Fri this week)

shantenujha commented 6 years ago

@jdakka - that would be good.

@jchodera: apologies I won't be able to do so, as I'm in Brussels + Rejyakvik this week and early next.

jdakka commented 6 years ago

@andrrizzi where we left off, did you get the chance to try the RADICAL-Pilot examples? I'll update the workflow diagram based on feedback. We can work together to get YANK integrated into EnTK. I'd be fine with coming to Sloan, once you feel that we're at a good point to hack it.

andrrizzi commented 6 years ago

@jdakka sorry I've been crazy busy with a couple of deadlines. I should be able to go through the examples and find eventual problems that we'll have to solve by the end of next week.

jdakka commented 6 years ago

@andrrizzi based on the pseudo code you provided I had a question about the compute_energy function:compute_energy(thermodynamic_state, sampler_state) Is this another MD/MC step?

andrrizzi commented 6 years ago

Nope! This only computes the energy of the system given the alchemical lambda parameters, which are in the thermodynamic_state object, and positions/box vectors in sampler_state.

jdakka commented 6 years ago

@andrrizzi I'm looking for a few more detail to understand how we can best represent Yank:

since YANK is a one-world MPI application, and the rank is dependent on the available n_processes and m_replicas, could you roughly estimate n and m, based on a single Yank protocol instance? I assume this also heavily depends on the mutation/physical system, but could you provide an lower/upper bound?
also, roughly what is the number of mutations? Does each Yank protocol instance reference a different mutation?

andrrizzi commented 6 years ago

I may have misunderstood, but in a normal execution of YANK, the MPI rank is dependent on n_processes and the option processes_per_experiment (in this case, instead of experiment you can think "mutant"). YANK then automatically splits the MPI world into n_processes / processes_per_experiment communicators, and it runs each mutant in its own communicator. The m_replicas replicas of the mutant are parallelized over the processes_per_experiment GPUs of each communicator.

could you provide an lower/upper bound?

n_processes is really up to us. In theory, we can use as many processes in parallel as we need/can for a single execution. m_replicas for a kinase simulation would probably be within [70, 90].

also, roughly what is the number of mutations?

@steven-albanese probably has a better idea than me about this.

Does each Yank protocol instance reference a different mutation?

This is entirely up to us actually. We can decide how many mutations we want to run in a single YANK execution. If n_processes / processes_per_experiment < n_mutants, YANK just runs sequentially the mutants that have been assigned to this execution.

radical-collaboration / MSKCC

YANK workflow representation #6