Finalize instructions for SAMPLing challenge

davidlmobley commented 7 years ago

This SAMPL, we are including a SAMPLing challenge, focused on looking at convergence of specific calculations with shared inputs (see #4 ). Right now we are focusing on host-guest binding prediction. We're still discussing in #4 exactly what compounds to use and how to handle relative vs absolute calculations, but we can also think about what instructions we will give/what we want people to submit.

Here's what I propose:

We pick (in #4 ) two to three host-guest systems, with at least some opportunity for relative free energy calculations
We ask everyone to run calculations on these systems five times, starting the five replicas from different docked poses of the compounds (to get larger variation than just varying random number seeds)
We ask people to submit predicted values for each trial estimated using various fractions of their data; perhaps at intervals of 1% of their data? e.g. value estimated using first 1%, first 2%, first 3%, ... full set?
We ask people to also submit total computer time used, total wallclock time used, total number of energy evaluations used, hardware used

How should we handle the issue @jmichel80 raised about many short runs vs long runs? Maybe we give people the freedom to do as the above but give multiple submissions, so they could use a "single long run protocol" or a "many short runs protocol" but still report in the same way? So in the "many short runs" approach their "first 1% of the data" would represent many copies of data from a short amount of time, whereas in the long run case this would be a longer simulation but only one trial.

@andrrizzi , other thoughts?

andrrizzi commented 7 years ago

starting the five replicas from different docked poses of the compounds

Sounds good. So, to be sure I understand correctly, we will generate and provide the input files for all the 5 docked poses?

value estimated using first 1%, first 2%, first 3%, ... full set?

Should these estimates use the data in sequence or bootstrapped? I would be very much interested in collecting also correlation times data for all the methods. I realize it may not be trivial to formulate a uniform way to compute these statistics for all methodologies when multiple time series are involved though (@jchodera might have input on this).

We ask people to also submit total computer time used, total wallclock time used, total number of energy evaluations used, hardware used

It would probably be hard to use this for comparing methods as different hardware/implementations affect these numbers so much. It could still be valuable data for providing rules of thumb though!

Technical question: what kind of input files should we provide for relative calculations? Is one of the solvated system enough? If both are necessary, I'll probably have to either switch to packmol or to edit the tleap output files to use a precise number of water molecules since the number of solvent degrees of freedom should probably stay constant at the endpoints.

davidlmobley commented 7 years ago

@andrrizzi

Sounds good. So, to be sure I understand correctly, we will generate and provide the input files for all the 5 docked poses?

Yes. Alternatively if we can't generate 5 by docking we can use some other means (dock a couple and equilibrate and pull snapshots).

value estimated using first 1%, first 2%, first 3%, ... full set? Should these estimates use the data in sequence or bootstrapped? I would be very much interested in collecting also correlation times data for all the methods. I realize it may not be trivial to formulate a uniform way to compute these statistics for all methodologies when multiple time series are involved though (@jchodera might have input on this).

In sequence; I think bootstrapping would partly defeat the purpose; part of the idea here is that there may be rather long timescale events and analyzing in sequence addresses that whereas bootstrapping doesn't necessarily.

How would you want to get at correlation times? Provide a library for people to run and ask them to report this?

We ask people to also submit total computer time used, total wallclock time used, total number of energy evaluations used, hardware used It would probably be hard to use this for comparing methods as different hardware/implementations affect these numbers so much. It could still be valuable data for providing rules of thumb though!

Yes. The thought is not so much to use this for quantitative comparison as just to get this on the radar/be able to say qualitative things about it and, as you say, provide some rules of thumb perhaps.

Technical question: what kind of input files should we provide for relative calculations? Is one of the solvated system enough? If both are necessary, I'll probably have to either switch to packmol or to edit the tleap output files to use a precise number of water molecules since the number of solvent degrees of freedom should probably stay constant at the endpoints.

Hmm, that's a good question. Let me loop in Bryce at Silicon...

brycestx commented 7 years ago

@davidlmobley

Technical question: what kind of input files should we provide for relative calculations? Is one of the solvated system enough? If both are necessary, I'll probably have to either switch to packmol or to edit the tleap output files to use a precise number of water molecules since the number of solvent degrees of freedom should probably stay constant at the endpoints. Hmm, that's a good question. Let me loop in Bryce at Silicon...

I would think we could provide the inputs after atom mapping. For the amber relative binding free energy calculations with pmemd we use a single topology which has the solvated perturbation (ligand A to B). Therefore only a single topology, coordinate and config file would be necessary for reproducibility.

davidlmobley commented 7 years ago

Bryce and I discussed further offline, and he confirmed that one needs ligand A and ligand B setups (for a relative calculation) that have the same number of water molecules, at least for AMBER.

That said, @brycestx - if we wanted to make it easy for someone like you guys to do a relative free energy calculation, what exact inputs would you want? Would you want solvated prmtop/coordinate files for ligand A and B in complex and in solution as well as the mapping of atoms perturbed/removed/inserted? Or would you want just the ligands posed in the "receptor" without solvent, or...? Having not done relative free energies with AMBER I'm not actually sure what you'd want. (For GROMACS, where I HAVE done them, I'd probably want the coordinate/topology files (I could get by with having them for just one ligand) and the atom mapping and could generate relative inputs from there.)

jmichel80 commented 7 years ago

@davidlmobley

imo it is tricky to provide a consistent set of topologies for relative calcs because different packages have different mechanics to handle dummy atoms (explicit or not), bonded terms involving dummy atoms and masses of perturbed atoms, and the structure of the input files for a given code may differ depending on the chosen lambda schedule.

You could leave it at providing coordinate/topology for A or B and let others work out by themselves how to map atoms in A to B and adjust the inputs to their code.

@halx may be available to chime in...

davidlmobley commented 7 years ago

@jmichel80 - I'm thinking to NOT provide actual inputs for relative calculations, just to provide A and B topologies and possibly a proposed mapping. Does that sound reasonable?

(And are you on board with the rest of this and #4 ?)

davidlmobley commented 7 years ago

Ah, sorry, I'd missed your response on #4 . OK, so we're almost set then.

andrrizzi commented 7 years ago

Ok!

How would you want to get at correlation times? Provide a library for people to run and ask them to report this?

Possibly! pymbar.timeseries has a function to compute the integrated autocorrelation time of an arbitrary sequence. The main problem I'm referring to is: which timeseries would we ask people to consider? For a replica exchange simulation with 20 lambda windows, for example, should we ask 20 correlation times or a single one (computed by summing all potential energies at each iteration)? I'm not sure if there are caveats we need to consider when it comes to compare different correlation times of different methods. Probably, for MD, we would need the time between samples.

provide A and B topologies and possibly a proposed mapping.

This sounds feasible to me. I could make the mapping available in JSON format.

davidlmobley commented 7 years ago

Yes, I agree that replica exchange timeseries can have relatively complex issues here. Main thing to think about is what it would be useful to analyze on your end if you get all this from people...

zjing7 commented 7 years ago

You would want to compute the correlation times for the thermodynamic state of interest. There is no caveat as long as the correlation coefficient goes to 0. When comparing correlation times of replica exchange with other methods, replica exchange should give smaller values. However, keep in mind that replica exchange also uses more resources. So for temperature RE with N replicas and only one thermodynamics state is of interest, the computational cost increases to N times. The best way I think is to report the error for a fixed total simulation time, e.g. 10 ns.

Zhifeng (Francis) Jing Graduate Student University of Texas

On Oct 4, 2017, at 6:41 PM, Andrea Rizzi notifications@github.com wrote:

Ok!

How would you want to get at correlation times? Provide a library for people to run and ask them to report this?

Possibly! pymbar.timeseries has a function to compute the integrated autocorrelation time of an arbitrary sequence. The main problem I'm referring to is: which timeseries would we ask people to consider? For a replica exchange simulation with 20 lambda windows, for example, should we ask 20 correlation times or a single one (computed by summing all potential energies at each iteration)? I'm not sure if there are caveats we need to consider when it comes to compare different correlation times of different methods. Probably, for MD, we would need the time between samples.

provide A and B topologies and possibly a proposed mapping.

This sounds feasible to me. I could make the mapping available in JSON format.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

davidlmobley commented 7 years ago

@zjing7 :

When comparing correlation times of replica exchange with other methods, replica exchange should give smaller values. However, keep in mind that replica exchange also uses more resources.

Part of what we're discussing (which we should have spelled out) is that with replica exchange, you will end up with artificially low apparent correlation times if you are analyzing one particular thermodynamic state because of swaps between replicas, e.g. imagine free energy calculations where we are analyzing a lambda=0 state and every 10 ps one of the simulations run at other states swaps in to lambda=0, so even if every individual state is stuck in a different conformation for your entire simulation, your apparent correlation time may end up being something like 10 ps. That's sort of a worst-case scenario I suppose, but hopefully it illustrates the point.

Alternatively, one can compute the correlation time on a single simulation as it walks between thermodynamic states, but this is also not really what you want as the correlation time may be a function of thermodynamic state. So, it's complex.

zjing7 commented 7 years ago

Yes, I understood your point. But statistical errors can be rigorously calculated by correlation time, as long as the same time series are used for computing the correlation time and for calculating the quantity of interest. So the decrease in correlation time is not an artifact, but reflects the fact that as more replicas contribute to one thermodynamic state, the statistical error will be reduced. Certainly these additional replicas do not come for free, so this has to be taken into account. To make fair comparison, we have to multiply the computed correlation time by N (number of replicas), since for every time step we are actually simulating N time steps. Nevertheless, as I said before, this is not as straightforward as reporting the statistical error for a given total simulation time.

On Wed, Oct 4, 2017 at 7:35 PM, David L. Mobley notifications@github.com wrote:

@zjing7 https://github.com/zjing7 :

When comparing correlation times of replica exchange with other methods, replica exchange should give smaller values. However, keep in mind that replica exchange also uses more resources.

Part of what we're discussing (which we should have spelled out) is that with replica exchange, you will end up with artificially low apparent correlation times if you are analyzing one particular thermodynamic state because of swaps between replicas, e.g. imagine free energy calculations where we are analyzing a lambda=0 state and every 10 ps one of the simulations run at other states swaps in to lambda=0, so even if every individual state is stuck in a different conformation for your entire simulation, your apparent correlation time may end up being something like 10 ps. That's sort of a worst-case scenario I suppose, but hopefully it illustrates the point.

Alternatively, one can compute the correlation time on a single simulation as it walks between thermodynamic states, but this is also not really what you want as the correlation time may be a function of thermodynamic state. So, it's complex.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MobleyLab/SAMPL6/issues/9#issuecomment-334327470, or mute the thread https://github.com/notifications/unsubscribe-auth/AG3nE2RvDbiY63pkmOy_TD7g9EOBrzWsks5spCRogaJpZM4PskxM .

-- Zhifeng (Francis) Jing Graduate Student, Biomedical Engineering University of Texas at Austin

brycestx commented 7 years ago

@davidlmobley

That said, @brycestx - if we wanted to make it easy for someone like you guys to do a relative free energy calculation, what exact inputs would you want? Would you want solvated prmtop/coordinate files for ligand A and B in complex and in solution as well as the mapping of atoms perturbed/removed/inserted? Or would you want just the ligands posed in the "receptor" without solvent, or...? Having not done relative free energies with AMBER I'm not actually sure what you'd want. (For GROMACS, where I HAVE done them, I'd probably want the coordinate/topology files (I could get by with having them for just one ligand) and the atom mapping and could generate relative inputs from there.)

Yes precisely. Having the coordinate/topology files for just one ligand is all we need to generate new relative perturbations.

jchodera commented 7 years ago

I concur with @zjing7's points.

The ideal outcome of this challenge is to be able to produce plots of the expected bias, sqrt(variance), and total error estimates as a function of total simulation effort (with confidence intervals) for each method. One way to do this would be to run each method many times from one or more initial conditions, but the number of independent replicates would need to be large---ideally 20 times for the 68% confidence intervals to be known to one digit of precision.

This is impractical for most cases, so the best alternative would seem to be for those methods that assume stationarity (such as Hamiltonian replica exchange) to just run a very long simulation from which the statistics of the estimators and multicanonical correlation times can be extracted and be used to estimate these plots. For adaptive methods (e.g. simulated scaling, SAMS), however, this will not work because these methods are not stationary Markov chains.

The idea of running ~5 independent simulations and providing the estimates of each as a function of simulation effort sounds like a reasonable compromise, but how will we quantify simulation effort? Wall clock time very much depends on hardware and resource availability. Number of force evaluations? Aggregate ns of simulation time?

davidlmobley commented 7 years ago

how will we quantify simulation effort? Wall clock time very much depends on hardware and resource availability. Number of force evaluations? Aggregate ns of simulation time?

I think we want to get multiple metrics of simulation effort -- number of force evaluations, wall clock time (and hardware used), aggregate ns, etc. We're not going to be after declaring winners and losers or ranking performance here -- just starting to gather data so we can compare qualitatively and (to the extent possible) quantitatively. Hopefully this can help set the stage for future challenges where we gradually can begin moving towards ranking performance.

davidlmobley commented 7 years ago

Resolved by #11

@jmichel80 - files are now up/available online. I'll do a point release momentarily so you have a specific version to refer to.

samplchallenges / SAMPL6

Finalize instructions for SAMPLing challenge #9